Bin Ma

Orcid: 0000-0002-9223-9654

Affiliations:
  • Alibaba Group, Speech Lab, Singapore
  • Nanyang Technological University, School of Computer Science and Engineering, Singapore
  • Institute for Infocomm Research, A*STAR, Singapore (since 2004)
  • University of Hong Kong, Hong Kong (PhD 2000)


According to our database1, Bin Ma authored at least 268 papers between 1999 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Tuning Large Language Model for Speech Recognition With Mixed-Scale Re-Tokenization.
IEEE Signal Process. Lett., 2024

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions.
CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024

Towards Audio Codec-based Speech Separation.
CoRR, 2024

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis.
CoRR, 2024

Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2024

SPGM: Prioritizing Local Features for Enhanced Speech Separation Performance.
Proceedings of the IEEE International Conference on Acoustics, 2024

Are Soft Prompts Good Zero-Shot Learners for Speech Recognition?
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention.
CoRR, 2023

deHuBERT: Disentangling Noise in a Self-supervised Model for Robust Speech Recognition.
CoRR, 2023

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Dual Acoustic Linguistic Self-supervised Representation Learning for Cross-Domain Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Small Footprint Multi-channel Network for Keyword Spotting with Centroid Based Awareness.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MossFormer: Pushing the Performance Limit of Monaural Speech Separation Using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions.
Proceedings of the IEEE International Conference on Acoustics, 2023

D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network Using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023

Adaptive Knowledge Distillation Between Text and Speech Pre-Trained Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Contrastive Speech Mixup for Low-Resource Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2023

De'hubert: Disentangling Noise in a Self-Supervised Model for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Auxiliary Pooling Layer For Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2023

Analysis of Speech Separation Performance Degradation on Emotional Speech Mixtures.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages.
CoRR, 2022

I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization.
CoRR, 2022

Learning Disentangled Representations for Counterfactual Regression via Mutual Information Minimization.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

FRCRN: Boosting Feature Representation Using Frequency Recurrence for Monaural Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression.
Proceedings of the IEEE International Conference on Acoustics, 2022

CPT: Cross-Modal Prefix-Tuning for Speech-To-Text Translation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Multimodal Sentiment Analysis on Unaligned Sequences Via Holographic Embedding.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram.
Proceedings of the IEEE International Conference on Acoustics, 2021

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses.
Proceedings of the IEEE International Conference on Acoustics, 2021

Preventing Early Endpointing for Online Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Unified Speaker Adaptation Approach for ASR.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Heterogeneous Graph Neural Networks for Large-Scale Bid Keyword Matching.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020
Fast Query-by-Example Speech Search Using Attention-Based Deep Binary Embeddings.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Cross Attention with Monotonic Alignment for Speech Transformer.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Universal Speech Transformer.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speech Transformer with Speaker Aware Persistent Memory.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Independent Language Modeling Architecture for End-To-End ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings With Temporal Context.
IEEE Access, 2019

Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Towards Language-Universal Mandarin-English Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Robust Audio-visual Speech Recognition Using Bimodal Dfsmn with Multi-condition Training and Dropout Regularization.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Alibaba Speech Translation Systems for IWSLT 2018.
Proceedings of the 15th International Conference on Spoken Language Translation, 2018

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017
Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Spectral-domain speech enhancement for speech recognition.
Speech Commun., 2017

Multitask Feature Learning for Low-Resource Query-by-Example Spoken Term Detection.
IEEE J. Sel. Top. Signal Process., 2017

Filtering for Malice Through the Data Ocean: Large-Scale PHA Install Detection at the Communication Service Provider Level.
Proceedings of the Research in Attacks, Intrusions, and Defenses, 2017

Multi-Task Learning for Mispronunciation Detection on Singapore Children's Mandarin Speech.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance Based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNN.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017


Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Modification on LSA speech enhancement for speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Efficient methods to train multilingual bottleneck feature extractors for low resource keyword search.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Adaptation of PLDA for multi-source text-independent speaker verification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Transfer learning for children's speech recognition.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Improving air traffic control speech intelligibility by reducing speaking rate effectively.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Extracting bottleneck features and word-like pairs from untranscribed speech for feature representation.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Multilingual bottle-neck feature learning from untranscribed speech.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

I2R-NUS submission to oriental language recognition AP16-OL7 challenge.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Convolutional neural network with multi-task learning scheme for acoustic scene classification.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Low-resource spoken keyword search strategies in georgian inspired by distinctive feature theory.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Exploration of Local Variability in Text-Independent Speaker Verification.
J. Signal Process. Syst., 2016

Large-scale characterization of non-native Mandarin Chinese spoken by speakers of European origin: Analysis on iCALL.
Speech Commun., 2016

Multi-Modal Hybrid Deep Neural Network for Speech Enhancement.
CoRR, 2016

Fantastic 4 system for NIST 2015 Language Recognition Evaluation.
CoRR, 2016

I2R Submission to the 2015 NIST Language Recognition I-vector Challenge.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

The NNI Vietnamese Speech Recognition System for MediaEval 2016.
Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

Learning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair Information.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Joint Speaker and Lexical Modeling for Short-Term Characterization of Speaker.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Context Aware Mispronunciation Detection for Mandarin Pronunciation Training.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Rapid Update of Multilingual Deep Neural Network for Low-Resource Keyword Search.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The 2015 NIST Language Recognition Evaluation: The Shared View of I2R, Fantastic4 and SingaMS.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Discriminatively trained joint speaker and environment representations for adaptation of deep neural network acoustic models.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Approximate search of audio queries by using DTW with phone time boundary and data augmentation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Cross-lingual deep neural network based submodular unbiased data selection for low-resource keyword search.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Exemplar-inspired strategies for low-resource spoken keyword search in Swahili.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Content-aware local variability vector for speaker verification with short utterance.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Acoustic Segment Modeling with Spectral Clustering Methods.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Corpus-based pronunciation variation rule analysis for singapore English.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

The NNI Query-by-Example System for MediaEval 2015.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Goodness of tone (GOT) for non-native Mandarin tone recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Investigation of parametric rectified linear units for noise robust speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Joint environment and speaker normalization using factored front-end CMLLR.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Stress level detection using double-layer subband filter.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Phonology-augmented statistical transliteration for low-resource languages.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Topic modeling for conference analytics.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The reddots platform for mobile crowd-sourcing of speech data.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The reddots data collection for speaker recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Phone-centric local variability vector for text-constrained speaker verification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Language independent query-by-example spoken term detection using N-best phone sequences and partial matching.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Tokenizing fundamental frequency variation for Mandarin tone error detection.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A new study of GMM-SVM system for text-dependent speaker recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Submodular data selection with acoustic and phonetic features for automatic speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Unsupervised data selection and word-morph mixed language model for tamil low-resource keyword search.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Low-resource keyword search strategies for tamil.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Channel adaptation of plda for text-independent speaker verification.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Text-dependent speaker verification: Classifiers, databases and RSR2015.
Speech Commun., 2014

How We Found These Vulnerabilities in Android Applications.
Proceedings of the International Conference on Security and Privacy in Communication Networks, 2014

Text-Dependent Speaker Verification System in VHF Communication Channel.
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Local Variability Modeling for Text-Independent Speaker Verification.
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

The NNI Query-by-Example System for MediaEval 2014.
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Multiple time-span feature fusion for deep neural network modeling.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Local variability vector for text-independent speaker verification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A graph-based Gaussian component clustering approach to unsupervised acoustic modeling.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Virtual example for phonotactic language recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

The NIST SRE summed channel speaker recognition system.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

On the use of Bhattacharyya based GMM distance and neural net features for identification of cognitive load levels.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A minimal-resource transliteration framework for vietnamese.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A whispered Mandarin corpus for speech technology applications.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Extended RSR2015 for text-dependent speaker verification over VHF channel.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Subspace Gaussian mixture model for computer-assisted language learning.
Proceedings of the IEEE International Conference on Acoustics, 2014

Imposture classification for text-dependent speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2014

Modelling the alternative hypothesis for text-dependent speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2014

Strategies for Vietnamese keyword search.
Proceedings of the IEEE International Conference on Acoustics, 2014

Minimum divergence estimation of speaker prior in multi-session PLDA scoring.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Spoken Language Recognition With Prosodic Features.
IEEE Trans. Speech Audio Process., 2013

Sparse Classifier Fusion for Speaker Verification.
IEEE Trans. Speech Audio Process., 2013

Shifted-Delta MLP Features for Spoken Language Recognition.
IEEE Signal Process. Lett., 2013

Spoken Language Recognition: From Fundamentals to Practice.
Proc. IEEE, 2013


Unsupervised mining of acoustic subword units with segment-level Gaussian posteriorgrams.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improved unsupervised NAP training dataset design for speaker recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Multi-session PLDA scoring of i-vector for partially open-set speaker detection.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Large-scale characterization of Mandarin pronunciation errors made by native speakers of European languages.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A study on GMM-SVM with adaptive relevance factor and its comparison with i-vector and JFA for speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2013

Anti-model KL-SVM-NAP system for NIST SRE 2012 evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2013

Broadcast news story segmentation using latent topics on data manifold.
Proceedings of the IEEE International Conference on Acoustics, 2013

Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances.
Proceedings of the IEEE International Conference on Acoustics, 2013

Joint analysis of vocal tract length and temporal information for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Speaker clustering using vector representation with long-term feature for lecture speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Minimal-resource phonetic language models to summarize untranscribed speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012
Speaker Clustering and Cluster Purification Methods for RT07 and RT09 Evaluation Meeting Data.
IEEE Trans. Speech Audio Process., 2012

Discriminative feature extraction for speech recognition using continuous output codes.
Pattern Recognit. Lett., 2012

Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features.
IEICE Trans. Inf. Syst., 2012

Bhattacharyya-based GMM-SVM system with adaptive relevance factor for pair language recognition.
Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012

Variational Bayes logistic regression as regularized fusion for NIST SRE 2010.
Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012

Welcome message from the technical program chairs.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Phonotactic spoken language recognition: Using diversely adapted acoustic models in parallel phone recognizers.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Effect of Relevance Factor of Maximum a posteriori Adaptation for GMM-SVM in Speaker and Language Recognition.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Unsupervised NAP Training Data Design for Speaker Recognition.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

PLDA Modeling in I-Vector and Supervector Space for Speaker Verification.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Ensemble Classifiers Using Unsupervised Data Selection for Speaker Recognition.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Acoustic TextTiling for story segmentation of spoken documents.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

An acoustic segment modeling approach to query-by-example spoken term detection.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Speaker Verification With Feature-Space MAPLR Parameters.
IEEE Trans. Speech Audio Process., 2011

Error Corrective Fusion of Classifier Scores for Spoken Language Recognition.
IEICE Trans. Inf. Syst., 2011

Target-Aware Lattice Rescoring for Dialect Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Study of Overlapped Speech Detection for NIST SRE Summed Channel Speaker Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Probabilistic Latent Semantic Analysis for Broadcast News Story Segmentation.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Joint Application of Speech and Speaker Recognition for Automation and Security in Smart Home.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speech Indexing Using Semantic Context Inference.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Maximum Entropy Based Data Selection for Speaker Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Regularized Logistic Regression Fusion for Speaker Verification.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Factored covariance modeling for text-independent speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2011

Score fusion and calibration in multiple language detectors with large performance variation.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
TechWare: Speaker and Spoken Language Recognition Resources [Best of the Web].
IEEE Signal Process. Mag., 2010

Autonomous acoustic model adaptation for multilingual meeting transcription involving high- and low-resourced languages.
Proceedings of the 2nd Workshop on Spoken Language Technologies for Under-Resourced Languages, 2010

Detection target dependent score calibration for language recognition.
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Parallel Acoustic Model Adaptation for Improving Phonotactic Language Recognition.
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Factor analysis based spatial correlation modeling for speaker verification.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Frame selection of interview channel for NIST speaker recognition evaluation.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Non-negative matrix factorization based discriminative features for speaker verification.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Building topic mixture language models using the document soft classification notion of topic models.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

MAP estimation of subspace transform for speaker recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Phoneme lattice based texttiling towards multilingual story segmentation.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The estimation and kernel metric of spectral correlation for text-independent speaker verification.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Selecting phonotactic features for language recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The IIR NIST SRE 2008 and 2010 summed channel speaker recognition systems.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Speaker diarization in meeting audio for single distant microphone.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Towards long-range prosodic attribute modeling for language recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Effects of the phonological relevance in speaker verification.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Incorporating MAP estimation and covariance transform for SVM based speaker recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Speaker characterization using long-term and temporal information.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Approaching human listener accuracy with modern speaker verification.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A discriminative performance metric for GMM-UBM speaker identification.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A study of term weighting in phonotactic approach to spoken language recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Framewise Phone Classification Using Weighted Fuzzy Classification Rules.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Voice conversion: From spoken vowels to singing vowels.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

Soft margin estimation of Gaussian mixture model parameters for spoken language recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Speaker diarization system for RT07 and RT09 meeting room audio.
Proceedings of the IEEE International Conference on Acoustics, 2010

Prosodic attribute model for spoken language identification.
Proceedings of the IEEE International Conference on Acoustics, 2010

Error corrective classifier fusion for spoken Language Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Semi-supervised learning of language model using unsupervised topic model.
Proceedings of the IEEE International Conference on Acoustics, 2010

I2R Text-to-Speech System for Blizzard Challenge 2010.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

2009
A Target-Oriented Phonotactic Front-End for Spoken Language Recognition.
IEEE Trans. Speech Audio Process., 2009

Analysis and Selection of Prosodic Features for Asian Language Recognition.
Int. J. Asian Lang. Process., 2009

Speaker Characterization using Average Filtering and Two Space Fusions.
Int. J. Asian Lang. Process., 2009

Language models learning for domain-specific natural language user interaction.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2009

Large margin estimation of Gaussian mixture model parameters with extended baum-welch for spoken language recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Target-aware language models for spoken language recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Speaker diarization for meeting room audio.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Discriminative feature transformation using output coding for speech recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Acoustic segment modeling for speaker recognition.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Joint map adaptation of feature transformation and Gaussian Mixture Model for speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

Cross-validation of multiple language recognition systems using pseudo keys.
Proceedings of the IEEE International Conference on Acoustics, 2009

Evaluation of a fused FM and cepstral-based speaker recognition system on the NIST 2008 SRE.
Proceedings of the IEEE International Conference on Acoustics, 2009

Exploiting prosodic information for Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009


Analysis and Selection of Prosodic Features for Language Identification.
Proceedings of the 2009 International Conference on Asian Language Processing, 2009

A Lattice-Based Phonotactic Language Recognition System with CMLLR Adaptation and Its Implementation Issues.
Proceedings of the 2009 International Conference on Asian Language Processing, 2009

I2R Text-to-Speech System for Blizzard Challenge 2009.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009

2008
Optimizing the Performance of Spoken Language Recognition With Discriminative Training.
IEEE Trans. Speech Audio Process., 2008

NIST 2007 Language Recognition Evaluation: From the Perspective of IIR.
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, 2008

Self-Organized Clustering for Feature Mapping in Language Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

An Efficient Feature Selection Method for Speaker Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Using Pseudo-Key for Language Recognition System Design.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Discriminative Output Coding Features for Speech Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Using MAP estimation of feature transformation for speaker recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Target-oriented phone selection from universal phone set for spoken language recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Robust speaker verification using short-time frequency with long-time window and fusion of multi-resolutions.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Fuzzy rule selection using Iterative Rule Learning for speech data classification.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Unsupervised pronunciation grammar growing using knowledge-based and data-driven approaches.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Discriminative learning for optimizing detection performance in spoken language recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

Target-oriented phone tokenizers for spoken language recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

I2R's Submission to Blizzard Challenge 2008.
Proceedings of the Blizzard Challenge 2008, 2008

2007
Spoken Language Recognition Using Ensemble Classifiers.
IEEE Trans. Speech Audio Process., 2007

A Vector Space Modeling Approach to Spoken Language Identification.
IEEE Trans. Speech Audio Process., 2007

Using direction of arrival estimate and acoustic feature information in speaker diarization.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

A Generalized Feature Transformation Approach for Channel Robust Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2007

Spoken Language Recognition with Relevance Feedback.
Proceedings of the IEEE International Conference on Acoustics, 2007

Discriminative Vector for Spoken Language Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

Effects of Device Mismatch, Language Mismatch and Environmental Mismatch on Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2007

Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation.
Proceedings of the Multimodal Technologies for Perception of Humans, 2007

2006
A Comparative Study of Four Language Identification Systems.
Int. J. Comput. Linguistics Chin. Lang. Process., 2006

Language Recognition Based on Score Distribution Feature Vectors and Discriminative Classifier Fusion.
Proceedings of the Odyssey 2006: The Speaker and Language Recognition Workshop, 2006

Minimum Classification Error Based Optimal Linear Combination for Spoken Language Identification.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

Fusion of Acoustic and Tokenization Features for Speaker Recognition.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

The IIR Submission to CSLP 2006 Speaker Recognition Evaluation.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Speaker cluster based GMM tokenization for speaker recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Vector-based spoken language recognition using output coding.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Integrating Acoustic, Prosodic and Phonotactic Features for Spoken Language Identification.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Chinese Dialect Identification Using Tone Features Based on Pitch Flux.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
A phonotactic-semantic paradigm for automatic spoken document classification.
Proceedings of the SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005

An acoustic segment modeling approach to automatic language identification.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

A text categorization approach to automatic language identification.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Using Local & Global Phonotactic Features in Chinese Dialect Identification.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

A Phonotactic Language Model for Spoken Language Identification.
Proceedings of the ACL 2005, 2005

2004
Fuzzy logic decision fusion in a multimodal biometric system.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

English-Chinese bilingual text-independent speaker verification.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2002
A comparative study of several incremental adaptation algorithms for speaker adaptation.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Likelihood probability mismatch analysis and normalization in multilingual speech applications.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Multilingual speech recognition with language identification.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001
Online adaptive learning of continuous-density hidden Markov models based on multiple-stream prior evolution and posterior pooling.
IEEE Trans. Speech Audio Process., 2001

2000
A study on acoustic modeling and adaptation in HMM-based speech recognition
PhD thesis, 2000

Benchmark Results of Triphone-based Acoustic Modeling on HKU96 and HKU99 Putonghua Corpora.
Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000

Robust speech recognition based on off-line elicitation of multiple priors and on-line adaptive prior fusion.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Efficient ML training of CDHMM parameters based on prior evolution, posterior intervention and feedback.
Proceedings of the IEEE International Conference on Acoustics, 2000

1999
On-line adaptive learning of CDHMM parameters based on multiple-stream prior evolution and posterior pooling.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Irrelevant variability normalization in learning HMM state tying from data based on phonetic decision-tree.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999


  Loading...