Haizhou Li

Orcid: 0000-0001-9158-9401

Affiliations:
  • Chinese University of Hong Kong (Shenzhen), China
  • National University of Singapore, Department of Electrical and Computer Engineering, Singapore
  • Nanyang Technological University, Singapore (2006 - 2016)
  • Institute for Infocomm Research, A*STAR, Singapore (2003 - 2016)
  • University of New South Wales, Sydney, Australia (2011)
  • University of Eastern Finland, Kuopio, Finland (2009)
  • South China University of Technology, Guangzhou, China (PhD 1990)


According to our database1, Haizhou Li authored at least 990 papers between 1993 and 2024.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2014, "For leadership in multilingual speaker and language recognition".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
LC-TTFS: Toward Lossless Network Conversion for Spiking Neural Networks With TTFS Coding.
IEEE Trans. Cogn. Dev. Syst., October, 2024

EEG-Based Auditory Attention Detection With Spiking Graph Convolutional Network.
IEEE Trans. Cogn. Dev. Syst., October, 2024

Audio-Visual Temporal Forgery Detection Using Embedding-Level Fusion and Multi-Dimensional Contrastive Loss.
IEEE Trans. Circuits Syst. Video Technol., August, 2024

Event-Triggered Tracking Control for Nonlinear Systems With Prescribed Performance.
IEEE Trans. Syst. Man Cybern. Syst., June, 2024

A Hybrid Neural Coding Approach for Pattern Recognition With Spiking Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Brain Topology Modeling With EEG-Graphs for Auditory Spatial Attention Detection.
IEEE Trans. Biomed. Eng., January, 2024

Few-Shot Contrastive Transfer Learning With Pretrained Model for Masked Face Verification.
IEEE Trans. Multim., 2024

Deep Cross-Modal Retrieval Between Spatial Image and Acoustic Speech.
IEEE Trans. Multim., 2024

Accented Text-to-Speech Synthesis With Limited Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Overview of the Tenth Dialog System Technology Challenge: DSTC10.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Speech Separation With Pretrained Frontend to Minimize Domain Mismatch.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Multi-Agent Deep Learning for the Detection of Multiple Speech Steganography Methods.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

NeuroHeed: Neuro-Steered Speaker Extraction Using EEG Signals.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoders.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Contrastive Learning Based Modality-Invariant Feature Acquisition for Robust Multimodal Emotion Recognition With Missing Modalities.
IEEE Trans. Affect. Comput., 2024

Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks.
IEEE Signal Process. Lett., 2024

Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech.
IEEE Signal Process. Lett., 2024

Transferable Adversarial Attacks Against ASR.
IEEE Signal Process. Lett., 2024

Advancing speaker embedding learning: Wespeaker toolkit for research and production.
Speech Commun., 2024

Efficient spiking neural network design via neural architecture search.
Neural Networks, 2024

Intelligent event-based lip reading word classification with spiking neural networks using spatio-temporal attention features and triplet loss.
Inf. Sci., 2024

VoiceBench: Benchmarking LLM-Based Voice Assistants.
CoRR, 2024

Multi-Level Speaker Representation for Target Speaker Extraction.
CoRR, 2024

Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement.
CoRR, 2024

Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech.
CoRR, 2024

Roadmap towards Superhuman Speech Understanding using Large Language Models.
CoRR, 2024

Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling.
CoRR, 2024

FluentEditor+: Text-based Speech Editing by Modeling Local Hierarchical Acoustic Smoothness and Global Prosody Consistency.
CoRR, 2024

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction.
CoRR, 2024

M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions.
CoRR, 2024

Aligning Language Models Using Follow-up Likelihood as Reward Signal.
CoRR, 2024

On the effectiveness of enrollment speech augmentation for Target Speaker Extraction.
CoRR, 2024

MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion.
CoRR, 2024

E1 TTS: Simple and Fast Non-Autoregressive TTS.
CoRR, 2024

Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection.
CoRR, 2024

NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention.
CoRR, 2024

Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing.
CoRR, 2024

Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset.
CoRR, 2024

Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models.
CoRR, 2024

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words.
CoRR, 2024

ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting.
CoRR, 2024

Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis.
CoRR, 2024

Target Speech Diarization with Multimodal Prompts.
CoRR, 2024

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis.
CoRR, 2024

How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
CoRR, 2024

Unsupervised Mutual Learning of Dialogue Discourse Parsing and Topic Segmentation.
CoRR, 2024

Mamba in Speech: Towards an Alternative to Self-Attention.
CoRR, 2024

Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis.
CoRR, 2024

Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems.
CoRR, 2024

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention.
CoRR, 2024

An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder.
CoRR, 2024

Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People.
CoRR, 2024

Fine-Grained Quantitative Emotion Editing for Speech Generation.
CoRR, 2024

Event-Driven Learning for Spiking Neural Networks.
CoRR, 2024

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing.
CoRR, 2024

Bridging Research and Readers: A Multi-Modal Automated Academic Papers Interpretation System.
CoRR, 2024

A Non-Intrusive Approach to Assessing Dysarthria Severity: Advancing Clinical Diagnosis.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024

Listen to the Speaker in Your Gaze.
Proceedings of the IEEE International Conference on Cybernetics and Intelligent Systems, 2024

Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

CMB: A Comprehensive Medical Benchmark in Chinese.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

UNO-DST: Leveraging Unlabelled Data in Zero-Shot Dialogue State Tracking.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

AceGPT, Localizing Large Language Models in Arabic.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Generative Expressive Conversational Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy.
Proceedings of the International Joint Conference on Neural Networks, 2024

Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

An Empirical Study on the Impact of Positional Encoding in Transformer-Based Monaural Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2024

SVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks.
Proceedings of the IEEE International Conference on Acoustics, 2024

Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Spiking-Leaf: A Learnable Auditory Front-End for Spiking Neural Networks.
Proceedings of the IEEE International Conference on Acoustics, 2024

Gradient Weighting for Speaker Verification in Extremely Low Signal-to-Noise Ratio.
Proceedings of the IEEE International Conference on Acoustics, 2024

Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-Talker Speech.
Proceedings of the IEEE International Conference on Acoustics, 2024

Prompt-Driven Target Speech Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2024

Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

LOCSELECT: Target Speaker Localization with an Auditory Selective Hearing Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2024

Robust Decoding of the Auditory Attention from EEG Recordings Through Graph Convolutional Networks.
Proceedings of the IEEE International Conference on Acoustics, 2024

DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

CrossTune: Black-Box Few-Shot Classification with Label Enhancement.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
DNN controlled adaptive front-end for replay attack detection systems.
Speech Commun., October, 2023

Achieving Green AI with Energy-Efficient Deep Learning Using Neuromorphic Computing.
Commun. ACM, July, 2023

Online Multi-Face Tracking With Multi-Modality Cascaded Matching.
IEEE Trans. Circuits Syst. Video Technol., June, 2023

A Tandem Learning Rule for Effective Training and Rapid Inference of Deep Spiking Neural Networks.
IEEE Trans. Neural Networks Learn. Syst., 2023

Separable Convolution Network With Dual-Stream Pyramid Enhanced Strategy for Speech Steganalysis.
IEEE Trans. Inf. Forensics Secur., 2023

Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

A Time-Frequency Attention Module for Neural Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

PoE: A Panel of Experts for Generalized Automatic Dialogue Assessment.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

STFF-SM: Steganalysis Model Based on Spatial and Temporal Feature Fusion for Speech Streams.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Self-Supervised Training of Speaker Encoder With Multi-Modal Diverse Positive Pairs.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Audio-Visual Cross-Attention Network for Robotic Speaker Tracking.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

PoLyScriber: Integrated Fine-Tuning of Extractor and Lyrics Transcriber for Polyphonic Music.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speech Synthesis With Mixed Emotions.
IEEE Trans. Affect. Comput., 2023

Emotion Intensity and its Control for Emotional Voice Conversion.
IEEE Trans. Affect. Comput., 2023

TTS-Guided Training for Accent Conversion Without Parallel Data.
IEEE Signal Process. Lett., 2023

Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis.
IEEE Signal Process. Lett., 2023

Time-Domain Speech Separation Networks With Graph Encoding Auxiliary.
IEEE Signal Process. Lett., 2023

The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge.
CoRR, 2023

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit.
CoRR, 2023

HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs.
CoRR, 2023

LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks with TTFS Coding.
CoRR, 2023

Quantify Health-Related Atomic Knowledge in Chinese Medical Large Language Models: A Computational Analysis.
CoRR, 2023

AceGPT, Localizing Large Language Models in Arabic.
CoRR, 2023

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency.
CoRR, 2023

Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech.
CoRR, 2023

USED: Universal Speaker Extraction and Diarization.
CoRR, 2023

A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems.
CoRR, 2023

EEG-Derived Voice Signature for Attended Speaker Detection.
CoRR, 2023

CMB: A Comprehensive Medical Benchmark in Chinese.
CoRR, 2023

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text.
CoRR, 2023

Long Short-term Memory with Two-Compartment Spiking Neuron.
CoRR, 2023

Constant Sequence Extension for Fast Search Using Weighted Hamming Distance.
CoRR, 2023

Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data Augmentation.
CoRR, 2023

Topic-driven Distant Supervision Framework for Macro-level Discourse Parsing.
CoRR, 2023

Phoenix: Democratizing ChatGPT across Languages.
CoRR, 2023

Enhancing Subject-Independent EEG-Based Auditory Attention Decoding with WGAN and Pearson Correlation Coefficient.
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2023

Relational Sentence Embedding for Flexible Semantic Matching.
Proceedings of the 8th Workshop on Representation Learning for NLP, 2023

GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning.
Proceedings of the Natural Language Processing and Chinese Computing, 2023

Disentangling Voice and Content with Self-Supervision for Speaker Recognition.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

XAnet: Cross-Attention Between EEG of Left and Right Brain for Auditory Attention Decoding.
Proceedings of the 11th International IEEE/EMBS Conference on Neural Engineering, 2023

Slow-Fast Time Parameter Aggregation Network for Class-Incremental Lip Reading.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommendation Systems.
Proceedings of the Fifth Knowledge-aware and Conversational Recommender Systems Workshop co-located with 17th ACM Conference on Recommender Systems (RecSys 2023), 2023

Speaker Extraction with Detection of Presence and Absence of Target Speakers.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

EEG-based Auditory Attention Detection with Spatiotemporal Graph and Graph Convolutional Network.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Target Active Speaker Detection with Audio-visual Cues.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Explicit Intensity Control for Accented Text-to-speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Local and Global Context Modeling with Relation Matching Task for Dialog Act Recognition.
Proceedings of the International Joint Conference on Neural Networks, 2023

Exploiting Modality-Invariant Feature for Robust Multimodal Emotion Recognition with Missing Modalities.
Proceedings of the IEEE International Conference on Acoustics, 2023

Ripple Sparse Self-Attention for Monaural Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023

Token2vec: A Joint Self-Supervised Pre-Training Framework Using Unpaired Speech and Text.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speaker Recognition with Two-Step Multi-Modal Deep Cleansing.
Proceedings of the IEEE International Conference on Acoustics, 2023

ImagineNet: Target Speaker Extraction with Intermittent Visual Cue Through Embedding Inpainting.
Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Transriber: Few-Shot Lyrics Transcription With Self-Training.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Head Attention and GRU for Improved Match-Mismatch Classification of Speech Stimulus and EEG Response.
Proceedings of the IEEE International Conference on Acoustics, 2023

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

HuatuoGPT, Towards Taming Language Model to Be a Doctor.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

How Well Do Text Embedding Models Understand Syntax?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

RGCnet: An Efficient Recursive Gated Convolutional Network for EEG-based Auditory Attention Detection.
Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

ADD 2023: the Second Audio Deepfake Detection Challenge.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Zero-shot multi-speaker accent TTS with limited accent data.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Dynamic Transformers Provide a False Sense of Efficiency.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Rectified Linear Postsynaptic Potential Function for Backpropagation in Deep Spiking Neural Networks.
IEEE Trans. Neural Networks Learn. Syst., 2022

EEG-Based Auditory Attention Detection via Frequency and Channel Neural Attention.
IEEE Trans. Hum. Mach. Syst., 2022

STAnet: A Spatiotemporal Attention Network for Decoding Auditory Spatial Attention From EEG.
IEEE Trans. Biomed. Eng., 2022

Selective Listening by Synchronizing Speech With Lips.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

USEV: Universal Speaker Extraction With Visual Cue.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Decoding Knowledge Transfer for Neural Text-to-Speech Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Deep Learning Approaches in Topics of Singing Information Processing.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Automatic Lyrics Transcription of Polyphonic Music With Lyrics-Chord Multi-Task Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

A Unique ICASSP 2022: During an Unusual Time [Conference Highlights].
IEEE Signal Process. Mag., 2022

Speaker Extraction With Co-Speech Gestures Cue.
IEEE Signal Process. Lett., 2022

Neural Acoustic-Phonetic Approach for Speaker Verification With Phonetic Attention Mask.
IEEE Signal Process. Lett., 2022

Discriminative speaker embedding with serialized multi-layer multi-head attention.
Speech Commun., 2022

Emotional voice conversion: Theory, databases and ESD.
Speech Commun., 2022

Progressive Tandem Learning for Pattern Recognition With Deep Spiking Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Noise-robust voice conversion with domain adversarial training.
Neural Networks, 2022

Self-Supervised Learning With Segmental Masking for Speech Representation.
IEEE J. Sel. Top. Signal Process., 2022

I4U System Description for NIST SRE'20 CTS Challenge.
CoRR, 2022

FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis.
CoRR, 2022

Mixed Emotion Modelling for Emotional Voice Conversion.
CoRR, 2022

A Focused Study on Sequence Length for Dialogue Summarization.
CoRR, 2022

The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022.
CoRR, 2022

Controllable Accented Text-to-Speech Synthesis.
CoRR, 2022

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception.
CoRR, 2022

PoLyScribers: Joint Training of Vocal Extractor and Lyrics Transcriber for Polyphonic Music.
CoRR, 2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge.
CoRR, 2022

Esaa: An Eeg-Speech Auditory Attention Detection Database.
Proceedings of the 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2022

Training Spiking Neural Networks with Local Tandem Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Deep residual spiking neural network for keyword spotting in low-resource settings.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Knowledge distillation for In-memory keyword spotting model.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Memobert: Pre-Training Model with Prompt-Based Learning for Multimodal Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Time-Frequency Attention for Monaural Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

ADD 2022: the first Audio Deep Synthesis Detection Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

A Hybrid Learning Framework for Deep Spiking Neural Networks with One-Spike Temporal Coding.
Proceedings of the IEEE International Conference on Acoustics, 2022

Self-Supervised Speaker Recognition with Loss-Gated Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Visualtts: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over.
Proceedings of the IEEE International Conference on Acoustics, 2022

MFA: TDNN with Multi-Scale Frequency-Channel Attention for Text-Independent Speaker Verification with Short Utterances.
Proceedings of the IEEE International Conference on Acoustics, 2022

L-SpEx: Localized Target Speaker Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2022

Genre-Conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music.
Proceedings of the IEEE International Conference on Acoustics, 2022

Experts Versus All-Rounders: Target Language Extraction for Multiple Target Languages.
Proceedings of the IEEE International Conference on Acoustics, 2022

FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Analyzing and Evaluating Faithfulness in Dialogue Summarization.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Generate, Discriminate and Contrast: A Semi-Supervised Sentence Representation Learning Framework.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022


M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Just Rank: Rethinking Evaluation with Word and Sentence Similarities.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

D-Score: Holistic Dialogue Evaluation Without Reference.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Multi-Tone Phase Coding of Interaural Time Difference for Sound Source Localization With Spiking Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Expressive TTS Training With Frame and Style Reconstruction Loss.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Three-Dimensional Speaker Localization: Audio-Refined Visual Scaling Factor Estimation.
IEEE Signal Process. Lett., 2021

NHSS: A speech and singing parallel database.
Speech Commun., 2021

An adaptive transmission line cochlear model based front-end for replay attack detection.
Speech Commun., 2021

Factorized WaveNet for voice conversion with limited data.
Speech Commun., 2021

FastTalker: A neural text-to-speech architecture with shallow and group autoregression.
Neural Networks, 2021

HuRAI: A brain-inspired computational model for human-robot auditory interface.
Neurocomputing, 2021

Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity.
CoRR, 2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.
CoRR, 2021

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis.
CoRR, 2021

Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Optimizing Voice Conversion Network with Cycle Consistency Loss of Speaker Identity.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue.
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2021

SLoClas: A Database for Joint Sound Localization and Classification.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Investigating the Impact of Pre-trained Language Models on Dialog Evaluation.
Proceedings of the Conversational AI for Natural Human-Centric Interaction, 2021

Capsule Network based End-to-end System for Detection of Replay Attacks.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-Stage Sequence-to-Sequence Training.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Phonetically Motivated Self-Supervised Speech Representation Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Neural Speaker Extraction with Speaker-Speech Cross-Attention Network.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Diagnosis of COVID-19 Using Auditory Acoustic Cues.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

GlobalPhone Mix-To-Separate Out of 2: A Multilingual 2000 Speakers Mixtures Database for Speech Separation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Rethinking Benchmarks for Neuromorphic Learning Algorithms.
Proceedings of the International Joint Conference on Neural Networks, 2021

GCC-PHAT with Speech-oriented Attention for Robotic Sound Source Localization.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Accumulated Decoupled Learning with Gradient Staleness Mitigation for Convolutional Neural Networks.
Proceedings of the 38th International Conference on Machine Learning, 2021

Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset.
Proceedings of the IEEE International Conference on Acoustics, 2021

The Multi-Speaker Multi-Style Voice Cloning Challenge 2021.
Proceedings of the IEEE International Conference on Acoustics, 2021

Leveraging Acoustic and Linguistic Embeddings from Pretrained Speech and Language Models for Intent Classification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Multi-Target DoA Estimation with an Audio-Visual Fusion Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2021

Muse: Multi-Modal Target Speaker Extraction with Visual Cues.
Proceedings of the IEEE International Conference on Acoustics, 2021

Learning Disentangled Feature Representations for Speech Enhancement Via Adversarial Training.
Proceedings of the IEEE International Conference on Acoustics, 2021

Representation Learning with Spectro-Temporal-Channel Attention for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Multi-Stage Speaker Extraction with Utterance and Frame-Level Reference Signals.
Proceedings of the IEEE International Conference on Acoustics, 2021

Data Augmentation with Signal Companding for Detection of Logical Access Attacks.
Proceedings of the IEEE International Conference on Acoustics, 2021

Graphspeech: Syntax-Aware Graph Attention Network for Neural Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

Revisiting Self-training for Few-shot Learning of Language Model.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Auditory Attention Detection with EEG Channel Attention.
Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

Low-Latency Auditory Spatial Attention Detection Based on Spectro-Spatial Features from EEG.
Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

SUTD-NUS System for Blizzard Challenge 2021.
Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021

Exploring Teacher-Student Learning Approach for Multi-Lingual Speech-to-Intent Classification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

DEEPA: A Deep Neural Analyzer for Speech and Singing Vocoding.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

PL-EESR: Perceptual Loss Based End-to-End Robust Speaker Representation Extraction.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Target Language Extraction at Multilingual Cocktail Parties.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Training Explainable Singing Quality Assessment Network with Augmented Data.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Towards Reference-Independent Rhythm Assessment of Solo Singing.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

DynaEval: Unifying Turn and Dialogue Level Evaluation.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Bootstrapped Unsupervised Sentence Representation Learning.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Significance of Subband Features for Synthetic Speech Detection.
IEEE Trans. Inf. Forensics Secur., 2020

SpEx: Multi-Scale Time Domain Speaker Extraction Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Automatic Leaderboard: Evaluation of Singing Quality Without a Standard Reference.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Multi-Task WaveRNN With an Integrated Architecture for Cross-Lingual Voice Conversion.
IEEE Signal Process. Lett., 2020

Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS.
IEEE Signal Process. Lett., 2020

DeepConversion: Voice conversion with limited parallel training data.
Speech Commun., 2020

An Efficient Threshold-Driven Aggregate-Label Learning Algorithm for Multimodal Information Processing.
IEEE J. Sel. Top. Signal Process., 2020

Supervised learning in spiking neural networks with synaptic delay-weight plasticity.
Neurocomputing, 2020

Two decades into Speaker Recognition Evaluation - are we there yet?
Comput. Speech Lang., 2020

HLT-NUS Submission for NIST 2019 Multimedia Speaker Recognition Evaluation.
CoRR, 2020

Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by Spiking Neural Network.
CoRR, 2020

Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks.
CoRR, 2020

You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference to ANN-Level Accuracy.
CoRR, 2020

Spike-Timing-Dependent Back Propagation in Deep Spiking Neural Networks.
CoRR, 2020

The FFSVC 2020 Evaluation Plan.
CoRR, 2020

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Personalized Singing Voice Generation Using WaveRNN.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Deep AM-FM: Toolkit for Automatic Dialogue Evaluation.
Proceedings of the Conversational Dialogue Systems for the Next Decade, 2020

Automatic Rank-Ordering of Singing Vocals with Twin-Neural Network.
Proceedings of the 21th International Society for Music Information Retrieval Conference, 2020

Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-Based LVCSR.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Deep Convolutional Spiking Neural Networks for Keyword Spotting.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multi-Modal Attention for Speech Emotion Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multi-Task Learning for End-to-End Noise-Robust Bandwidth Extension.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

SpEx+: A Complete Time Domain Speaker Extraction Network.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The Attacker's Perspective on Automatic Speaker Verification: An Overview.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Low Latency Auditory Attention Detection with Common Spatial Pattern Analysis of EEG Signals.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speaker-Utterance Dual Attention for Speaker and Utterance Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Code-Switching TTS with Cross-Lingual Language Model.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Independent Language Modeling Architecture for End-To-End ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Time-Domain Neural Network Approach for Speech Bandwidth Extension.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background Music Help?
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Effective Wavenet Adaptation for Voice Conversion with Limited Data.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Assessing the Scope of Generalized Countermeasures for Anti-Spoofing.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

On the Importance of Vocal Tract Constriction for Speaker Characterization: The Whispered Speech Study.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Teacher-Student Training For Robust Tacotron-Based TTS.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Real-Time Multiple Object Tracking with Discriminative Features.
Proceedings of the 16th International Conference on Control, 2020

Robust Real-time Face Tracking for People Wearing Face Masks.
Proceedings of the 16th International Conference on Control, 2020

Transformer-based Arabic Dialect Identification.
Proceedings of the International Conference on Asian Language Processing, 2020

The NUS & NWPU system for Voice Conversion Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

NUS-HLT System for Blizzard Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Spectral Features and Pitch Histogram for Automatic Singing Quality Evaluation with CRNN.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

HLT-NUS Submission for 2019 NIST Multimedia Speaker Recognition Evaluation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Classification of Speech with and without Face Mask using Acoustic Features.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Modeling Code-Switch Languages Using Bilingual Parallel Corpus.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
A Cost-Sensitive Deep Belief Network for Imbalanced Classification.
IEEE Trans. Neural Networks Learn. Syst., 2019

Spike Timing or Rate? Neurons Learn to Make Decisions for Both Through Threshold-Driven Plasticity.
IEEE Trans. Cybern., 2019

Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Speech-to-Singing Voice Conversion: The Challenges and Strategies for Improving Vocal Conversion Processes.
IEEE Signal Process. Mag., 2019

Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics.
Comput. Speech Lang., 2019

Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition.
CoRR, 2019

Automatic Lyrics Transcription in Polyphonic Music: Does Background Music Help?
CoRR, 2019

An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks.
CoRR, 2019

A Hybrid Learning Rule for Efficient and Rapid Inference with Spiking Neural Networks.
CoRR, 2019

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.
CoRR, 2019

A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data.
CoRR, 2019

Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification.
CoRR, 2019

RSL2019: A Realistic Speech Localization Corpus.
Proceedings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2019

Country Report - Singapore.
Proceedings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2019

First Leap Towards Development of Dialogue System for Autonomous Bus.
Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019

On the End-to-End Solution to Mandarin-English Code-Switching Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Large-Scale Speaker Diarization of Radio Broadcast Archives.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multi-Graph Decoding for Code-Switching ASR.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Robust Sound Recognition: A Neuromorphic Approach.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Code-Switching Detection Using ASR-Generated Language Posteriors.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 2019.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Importance of Audio-Source Separation for Singer Identification in Polyphonic Music.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multi-Level Adaptive Speech Activity Detector for Speech in Naturalistic Environments.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Combination of Model-Based and Feature-Based Strategy for Speech-to-Singing Alignment.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Target Speaker Extraction for Multi-Talker Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Unified Framework for Speaker and Utterance Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Linguistically Motivated Parallel Data Augmentation for Code-Switch Language Modeling.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustic Modeling for Automatic Lyrics-to-Audio Alignment.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

NUS Speak-to-Sing: A Web Platform for Personalized Speech-to-Singing Conversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

An Adaptive-Q Cochlear Model for Replay Spoofing Detection.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Long Range Acoustic Features for Spoofed Speech Detection.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Instantaneous Phase and Long-Term Acoustic Cues for Orca Activity Detection.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Competitive STDP-based Feature Representation Learning for Sound Event Classification.
Proceedings of the International Joint Conference on Neural Networks, 2019

Deep Spiking Neural Network with Spike Count based Learning Rule.
Proceedings of the International Joint Conference on Neural Networks, 2019

Neural Population Coding for Effective Temporal Classification.
Proceedings of the International Joint Conference on Neural Networks, 2019

Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2019

Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss.
Proceedings of the IEEE International Conference on Acoustics, 2019

Auditory Inspired Spatial Differentiation for Replay Spoofing Attack Detection.
Proceedings of the IEEE International Conference on Acoustics, 2019

Automatic Lyrics-to-audio Alignment on Polyphonic Music Using Singing-adapted Acoustic Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Word and Class Common Space Embedding for Code-switch Language Modelling.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

End-to-End Code-Switching ASR for Low-Resourced Language Pairs.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Time-Domain Speaker Extraction Network.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

WaveNet Factorization with Singular Value Decomposition for Voice Conversion.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Long Range Acoustic and Deep Features Perspective on ASVspoof 2019.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Allpass Modeling of Phase Spectrum of Speech Signals for Formant Tracking.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

SINGAN: Singing Voice Conversion with Generative Adversarial Networks.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Multi-band Spectral Entropy Information for Detection of Replay Attacks.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Domain Adversarial Training for Speech Enhancement.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Speaker-independent Spectral Mapping for Speech-to-Singing Conversion.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

MPD-AL: An Efficient Membrane Potential Driven Aggregate-Label Learning Algorithm for Spiking Neurons.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Generalizing I-Vector Estimation for Rapid Speaker Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Farewell Editorial.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Re-ranking spoken term detection with acoustic exemplars of keywords.
Speech Commun., 2018

Using language cluster models in hierarchical language identification.
Speech Commun., 2018

Is Neuromorphic MNIST neuromorphic? Analyzing the discriminative power of neuromorphic datasets in the time domain.
CoRR, 2018

A Multi-State Diagnosis and Prognosis Framework with Feature Learning for Tool Condition Monitoring.
CoRR, 2018

Generative X-Vectors for Text-Independent Speaker Verification.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Average Modeling Approach to Voice Conversion with Non-Parallel Data.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Phonetically Aware Exemplar-Based Prosody Transformation.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Semi-supervised Lyrics and Solo-singing Alignment.
Proceedings of the 19th International Society for Music Information Retrieval Conference, 2018

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A Shifted Delta Coefficient Objective for Monaural Speech Separation Using Multi-task Learning.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Mandarin-English Code-switching Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Co-whitening of I-vectors for Short and Long Duration Speaker Verification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Automatic Pronunciation Evaluation of Singing.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A Biologically Plausible Speech Recognition Framework Based on Spiking Neural Networks.
Proceedings of the 2018 International Joint Conference on Neural Networks, 2018

An Event-Based Cochlear Filter Temporal Encoding Scheme for Speech Signals.
Proceedings of the 2018 International Joint Conference on Neural Networks, 2018

Single Channel Speech Separation with Constrained Utterance Level Permutation Invariant Training Using Grid LSTM.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

On the Importance of Analytic Phase of Speech Signals in Spoken Language Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

End-to-End Hierarchical Language Identification System.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

The I2R-NWPU-NUS Text-to-Speech System for Blizzard Challenge 2018.
Proceedings of the Blizzard Challenge 2018, Hyderabad, India, September 8, 2018, 2018

Error Reduction Network for DBLSTM-based Voice Conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Extended Constant-Q Cepstral Coefficients for Detection of Spoofing Attacks.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Analysis of Speech and Singing Signals for Temporal Alignment.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Use of Claimed Speaker Models for Replay Detection.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Automatic Evaluation of Singing Quality without a Reference.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Second Order Factorized Model Adaptation for Short Duration Language Identification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Compensating Utterance Information in Fixed Phrase Speaker Verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Instantaneous Phase and Excitation Source Features for Detection of Replay Attacks.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Named-Entity Tagging and Domain adaptation for Better Customized Translation.
Proceedings of the Seventh Named Entities Workshop, 2018

NEWS 2018 Whitepaper.
Proceedings of the Seventh Named Entities Workshop, 2018

Report of NEWS 2018 Named Entity Transliteration Shared Task.
Proceedings of the Seventh Named Entities Workshop, 2018

2017
An Exemplar-Based Approach to Frequency Warping for Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Front-End for Antispoofing Countermeasures in Speaker Verification: Scattering Spectral Decomposition.
IEEE J. Sel. Top. Signal Process., 2017

Multitask Feature Learning for Low-Resource Query-by-Example Spoken Term Detection.
IEEE J. Sel. Top. Signal Process., 2017

Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice Conversion.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

ISCA Medal for Scientific Achievement.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Gain Compensation for Fast i-Vector Extraction Over Short Duration.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017


Investigating Scalability in Hierarchical Language Identification System.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multimodal Prediction of Affective Dimensions via Fusing Multiple Regression Techniques.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Adaptation of PLDA for multi-source text-independent speaker verification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

On the analysis and evaluation of prosody conversion techniques.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Named entity transliteration with sequence-to-sequence neural network.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

A review of the mandarin-english code-switching corpus: SEAME.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Improving air traffic control speech intelligibility by reducing speaking rate effectively.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

A data-driven prognostics framework for tool remaining useful life estimation in tool condition monitoring.
Proceedings of the 22nd IEEE International Conference on Emerging Technologies and Factory Automation, 2017

Extracting bottleneck features and word-like pairs from untranscribed speech for feature representation.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Sparse representation of phonetic features for voice conversion with and without parallel data.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Multilingual bottle-neck feature learning from untranscribed speech.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Improving N-gram language modeling for code-switching speech recognition.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

A dual alignment scheme for improved speech-to-singing voice conversion.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

I2R-NUS submission to oriental language recognition AP16-OL7 challenge.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Transformation of prosody in voice conversion.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Perceptual evaluation of singing quality.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

An integrated framework for multimodal human-robot interaction.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Low-resource spoken keyword search strategies in georgian inspired by distinctive feature theory.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization.
J. Signal Process. Syst., 2016

Exploration of Local Variability in Text-Independent Speaker Verification.
J. Signal Process. Syst., 2016

A Spiking Neural Network System for Robust Sequence Recognition.
IEEE Trans. Neural Networks Learn. Syst., 2016

Total Variability Modeling Using Source-Specific Priors.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Large-scale characterization of non-native Mandarin Chinese spoken by speakers of European origin: Analysis on iCALL.
Speech Commun., 2016

On the study of replay and voice conversion attacks to text-dependent speaker verification.
Multim. Tools Appl., 2016

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation.
EURASIP J. Adv. Signal Process., 2016

Noise Robust Speech Recognition Using Multi-Channel Based Channel Selection And ChannelWeighting.
CoRR, 2016

Spoofing detection under noisy conditions: a preliminary investigation and an initial database.
CoRR, 2016

Fantastic 4 system for NIST 2015 Language Recognition Evaluation.
CoRR, 2016

How the Brain Formulates Memory: A Spatio-Temporal Model Research Frontier.
IEEE Comput. Intell. Mag., 2016

An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Rapid Computation of I-vector.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

I2R Submission to the 2015 NIST Language Recognition I-vector Challenge.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Voice conversion and spoofing countermeasures for speaker verification.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

The NNI Vietnamese Speech Recognition System for MediaEval 2016.
Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

Multi-channel feature adaptation for robust speech recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Neural networks based channel compensation for i-vector speaker verification.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Learning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair Information.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A DNN-HMM Approach to Story Segmentation.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Semi-Supervised and Cross-Lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models Under Low-Resource Conditions.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Context Aware Mispronunciation Detection for Mandarin Pronunciation Training.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Rescoring Hypothesized Detections of Out-of-Vocabulary Keywords Using Subword Samples.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Rapid Update of Multilingual Deep Neural Network for Low-Resource Keyword Search.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The 2015 NIST Language Recognition Evaluation: The Shared View of I2R, Fantastic4 and SingaMS.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Out of Set Language Modelling in Hierarchical Language Identification.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

SERAPHIM Live! - Singing Synthesis for the Performer, the Composer, and the 3D Game Developer.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

SERAPHIM: A Wavetable Synthesis System with 3D Lip Animation for Real-Time Speech and Singing Applications on Mobile Platforms.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Audio and face video emotion recognition in the wild using deep neural networks and small datasets.
Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016

Approximate search of audio queries by using DTW with phone time boundary and data augmentation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Spoofing detection from a feature representation perspective.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Keyword search using query expansion for graph-based rescoring of hypothesized detections.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Cross-lingual deep neural network based submodular unbiased data selection for low-resource keyword search.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Exemplar-based sparse representation of timbre and prosody for voice conversion.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A hierarchical framework for language identification.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Combining multiple kernel models for automatic intelligibility detection of pathological speech.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Exemplar-inspired strategies for low-resource spoken keyword search in Swahili.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Content-aware local variability vector for speaker verification with short utterance.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

I-vector based deep neural network acoustic model adaptation using multilingual language resource.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Beamforming networks using spatial covariance features for far-field speech recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Spoofing speech detection using temporal convolutional neural network.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Computer-assisted pronunciation training: From pronunciation scoring towards spoken language learning.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Evaluating and Combining Name Entity Recognition Systems.
Proceedings of the Sixth Named Entity Workshop, 2016

Whitepaper of NEWS 2016 Shared Task on Machine Transliteration.
Proceedings of the Sixth Named Entity Workshop, 2016

Report of NEWS 2016 Machine Transliteration Shared Task.
Proceedings of the Sixth Named Entity Workshop, 2016

Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

2015
The Pareto Principle Is Everywhere: Finding Informative Sentences for Opinion Summarization Through Leader Detection.
Proceedings of the Recommendation and Search in Social Networks, 2015

Acoustic Segment Modeling with Spectral Clustering Methods.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Introduction to the Special Section on Continuous Space and Related Methods in Natural Language Processing.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Generalized Hough Transform for Speech Pattern Classification.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Decoupling Word-Pair Distance and Co-occurrence Information for Effective Long History Context Language Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Adequacy-Fluency Metrics: Evaluating MT in the Continuous Space Model Framework.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Quasi-Factorial Prior for i-vector Extraction.
IEEE Signal Process. Lett., 2015

Spoofing and countermeasures for speaker verification: A survey.
Speech Commun., 2015

Exemplar-based voice conversion using joint nonnegative matrix factorization.
Multim. Tools Appl., 2015

Mandarin-English code-switching speech corpus in South-East Asia: SEAME.
Lang. Resour. Evaluation, 2015

Context-dependent Phone Mapping for Acoustic Modeling of Under-resourced Languages.
Int. J. Asian Lang. Process., 2015

Visual Perception Based Engagement Awareness for Multiparty Human-Robot Interaction.
Int. J. Humanoid Robotics, 2015

Relevance factor of maximum a posteriori adaptation for GMM-NAP-SVM in speaker and language recognition.
Comput. Speech Lang., 2015

Towards Improving Dialogue Topic Tracking Performances with Wikification of Concept Mentions.
Proceedings of the SIGDIAL 2015 Conference, 2015

Popular song summarization using chorus section detection from audio signal.
Proceedings of the 17th IEEE International Workshop on Multimedia Signal Processing, 2015

Octave-dependent Probabilistic Latent Semantic Analysis to Chorus Detection of Popular Song.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

The NNI Query-by-Example System for MediaEval 2015.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Regularized non-negative matrix factorization using alternating direction method of multipliers and its application to source separation.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Sparse coding of total variability matrix.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Learning to estimate reverberation time in noisy and reverberant rooms.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Goodness of tone (GOT) for non-native Mandarin tone recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Phonology-augmented statistical transliteration for low-resource languages.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

An alternating optimization approach for phase retrieval.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The reddots platform for mobile crowd-sourcing of speech data.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The reddots data collection for speaker recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Phonemes frequency based PLLR dimensionality reduction for language recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A real-time variable-q non-stationary Gabor transform for pitch shifting.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Spiking neural networks and the generalised hough transform for speech pattern detection.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

TDTO language modeling with feedforward neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Phone-centric local variability vector for text-constrained speaker verification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Formant excursion in singing synthesis.
Proceedings of the 2015 IEEE International Conference on Digital Signal Processing, 2015

Language independent query-by-example spoken term detection using N-best phone sequences and partial matching.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A learning-based approach to direction of arrival estimation in noisy and reverberant environments.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Tokenizing fundamental frequency variation for Mandarin tone error detection.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Source-specific informative prior for i-vector extraction.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Combining robust spike coding with spiking neural networks for sound event classification.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Low-resource keyword search strategies for tamil.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Channel adaptation of plda for text-independent speaker verification.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Performance scoring of singing voice.
Proceedings of the 2015 International Conference on Asian Language Processing, 2015

Joint Chinese word segmentation and punctuation prediction using deep recurrent neural network for social media data.
Proceedings of the 2015 International Conference on Asian Language Processing, 2015

Towards improving the performance of Vector Space Model for Chinese Frequently Asked Question Answering.
Proceedings of the 2015 International Conference on Asian Language Processing, 2015

The expression of singing emotion - contradicting the constraints of song.
Proceedings of the 2015 International Conference on Asian Language Processing, 2015

Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

On statistical machine translation method for lexicon refinement in speech recognition.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

Detecting synthetic speech using long term magnitude and phase information.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

Robust speech recognition using beamforming with adaptive microphone gains and multichannel noise reduction.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Non-negative matrix factorization using stable alternating direction method of multipliers for source separation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

A density peak clustering approach to unsupervised acoustic subword units discovery.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

On the study of very low-resource language keyword search.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Mapping frames with DNN-HMM recognizer for non-parallel voice conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Multilingual exemplar-based acoustic model for the NIST Open KWS 2015 evaluation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Distance metric learning for kernel density-based acoustic model under limited training data conditions.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Whitepaper of NEWS 2015 Shared Task on Machine Transliteration.
Proceedings of the Fifth Named Entity Workshop, 2015

Report of NEWS 2015 Machine Transliteration Shared Task.
Proceedings of the Fifth Named Entity Workshop, 2015

Fundamental frequency modeling using wavelets for emotional voice conversion.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

An Entorhinal-Hippocampal Model for Simultaneous Cognitive Map Building.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

CLARA: A Multifunctional Virtual Agent for Conference Support and Touristic Information.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

2014
Real-Time Keypoint Recognition Using Restricted Boltzmann Machine.
IEEE Trans. Neural Networks Learn. Syst., 2014

Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Text-dependent speaker verification: Classifiers, databases and RSR2015.
Speech Commun., 2014

Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages.
IEICE Trans. Inf. Syst., 2014

A Comparison of Categorical Attribute Data Clustering Methods.
Proceedings of the Structural, Syntactic, and Statistical Pattern Recognition, 2014

Why Industrial Robots Should Become More Social - On the Design of a Natural Language Interface for an Interactive Robot Welder.
Proceedings of the Social Robotics - 6th International Conference, 2014

System and keyword dependent fusion for spoken term detection.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Screen feedback in human-robot interaction: How to enhance robot expressiveness.
Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication, 2014

Text-Dependent Speaker Verification System in VHF Communication Channel.
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Local Variability Modeling for Text-Independent Speaker Verification.
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

The NNI Query-by-Example System for MediaEval 2014.
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Acoustic emotion recognition based on fusion of multiple feature-dependent deep Boltzmann machines.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Local variability vector for text-independent speaker verification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Direction-driven navigation using cognitive map for mobile robots.
Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014

Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A deep neural network approach for sentence boundary detection in broadcast news.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Semi-supervised training for bottle-neck feature based DNN-HMM hybrid systems.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Joint nonnegative matrix factorization for exemplar-based voice conversion.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A graph-based Gaussian component clustering approach to unsupervised acoustic modeling.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Virtual example for phonotactic language recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A minimal-resource transliteration framework for vietnamese.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A comparative study of spectral transformation techniques for singing voice synthesis.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Extended RSR2015 for text-dependent speaker verification over VHF channel.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

I<sup>2</sup>r speech2singing perfects everyone's singing.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Feature compensation using linear combination of speaker and environment dependent correction vectors.
Proceedings of the IEEE International Conference on Acoustics, 2014

Subspace Gaussian mixture model for computer-assisted language learning.
Proceedings of the IEEE International Conference on Acoustics, 2014

Discriminative score normalization for keyword search decision.
Proceedings of the IEEE International Conference on Acoustics, 2014

Generalization of temporal filter and linear transformation for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Imposture classification for text-dependent speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2014

Modelling the alternative hypothesis for text-dependent speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2014

Wikipedia-based Kernels for dialogue topic tracking.
Proceedings of the IEEE International Conference on Acoustics, 2014

Intelligibility detection of pathological speech using asymmetric sparse kernel partial least squares classifier.
Proceedings of the IEEE International Conference on Acoustics, 2014

A discriminatively trained Hough Transform for frame-level phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Improving language modeling by using distance and co-occurrence information of word-pairs and its application to LVCSR.
Proceedings of the IEEE International Conference on Acoustics, 2014

Strategies for Vietnamese keyword search.
Proceedings of the IEEE International Conference on Acoustics, 2014

Minimum divergence estimation of speaker prior in multi-session PLDA scoring.
Proceedings of the IEEE International Conference on Acoustics, 2014

Learning optimal features for music transcription.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

Towards better keyword search performance on Malay broadcast news data.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

A study on replay attack and anti-spoofing for text-dependent speaker verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Emotional facial expression transfer based on temporal restricted Boltzmann machines.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Multi-view features in a DNN-CRF model for improved sentence unit detection on English broadcast news.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Ensemble Nyström method for predicting conflict level from speech.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

2013
Rapid Feedforward Computation by Temporal Encoding and Learning With Spiking Neurons.
IEEE Trans. Neural Networks Learn. Syst., 2013

Dynamics Analysis of a Population Decoding Model.
IEEE Trans. Neural Networks Learn. Syst., 2013

Optimization Algorithms and Applications for Speech and Language Processing.
IEEE Trans. Speech Audio Process., 2013

Spoken Language Recognition With Prosodic Features.
IEEE Trans. Speech Audio Process., 2013

Sparse Classifier Fusion for Speaker Verification.
IEEE Trans. Speech Audio Process., 2013

Shifted-Delta MLP Features for Spoken Language Recognition.
IEEE Signal Process. Lett., 2013

Speech Information Processing: Theory and Applications [Scanning the Issue].
Proc. IEEE, 2013

Spoken Language Recognition: From Fundamentals to Practice.
Proc. IEEE, 2013

A Spike-Timing-Based Integrated Model for Pattern Recognition.
Neural Comput., 2013

Continuous attractors of discrete-time recurrent neural networks.
Neural Comput. Appl., 2013

Making Social Robots More Attractive: The Effects of Voice Pitch, Humor and Empathy.
Int. J. Soc. Robotics, 2013

Dynamical properties of continuous attractor neural network with background tuning.
Neurocomputing, 2013

A-STAR: Toward translating Asian spoken languages.
Comput. Speech Lang., 2013

Exemplar-based voice conversion using non-negative spectrogram deconvolution.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Building Companionship through Human-Robot Collaboration.
Proceedings of the Social Robotics - 5th International Conference, 2013

Screen feedback: How to overcome the expressive limitations of a social robot.
Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication, 2013

The development and analysis of a Malay broadcasr news corpus.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

RGB-D based cognitive map building and navigation.
Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013


Attribute-based histogram equalization (HEQ) and its adaptation for robust speech recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Exemplar-based unit selection for voice conversion utilizing temporal information.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Unsupervised mining of acoustic subword units with segment-level Gaussian posteriorgrams.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

GMM based speaker variability compensated system for interspeech 2013 compare emotion challenge.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Multi-session PLDA scoring of i-vector for partially open-set speaker detection.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

ALIZE 3.0 - open source toolkit for state-of-the-art speaker recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Context-dependent phone mapping for LVCSR of under-resourced languages.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Large-scale characterization of Mandarin pronunciation errors made by native speakers of European languages.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A dynamic Gaussian process for voice conversion.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2013

A study on GMM-SVM with adaptive relevance factor and its comparison with i-vector and JFA for speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Temporal filter design by minimum KL divergence criterion for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Synthetic speech detection using temporal modulation feature.
Proceedings of the IEEE International Conference on Acoustics, 2013

Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2013

Language diarization for code-switch conversational speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

Broadcast news story segmentation using latent topics on data manifold.
Proceedings of the IEEE International Conference on Acoustics, 2013

Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances.
Proceedings of the IEEE International Conference on Acoustics, 2013

Temporal coding of local spectrogram features for robust sound recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Minimal-resource phonetic language models to summarize untranscribed speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

Recurrent neural network language modeling for code switching conversational speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

Meaning Unit Segmentation in English and Chinese: a New Approach to Discourse Phenomena.
Proceedings of the Workshop on Discourse in Machine Translation, 2013

Constrained adaptation of histogram equalization for robust speech recognition.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Conditional restricted Boltzmann machine for voice conversion.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Language diarization for conversational code-switch speech with pronunciation dictionary adaptation.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Graph-based informative-sentence selection for opinion summarization.
Proceedings of the Advances in Social Networks Analysis and Mining 2013, 2013

Voice conversion and spoofing attack on speaker verification systems.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

A particle filter compensation approach to robust LVCSR.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

Modeling of term-distance and term-occurrence information for improving n-gram language model performance.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012
Robust Multiperson Detection and Tracking for Mobile Service and Social Robots.
IEEE Trans. Syst. Man Cybern. Part B, 2012

Speaker Clustering and Cluster Purification Methods for RT07 and RT09 Evaluation Meeting Data.
IEEE Trans. Speech Audio Process., 2012

Low-Variance Multitaper MFCC Features: A Case Study in Robust Speaker Verification.
IEEE Trans. Speech Audio Process., 2012

Bitext Dependency Parsing With Auto-Generated Bilingual Treebank.
IEEE Trans. Speech Audio Process., 2012

Mixture of Factor Analyzers Using Priors From Non-Parallel Speech for Voice Conversion.
IEEE Signal Process. Lett., 2012

Discriminative feature extraction for speech recognition using continuous output codes.
Pattern Recognit. Lett., 2012

Learning regional transliteration variants.
Inf. Process. Manag., 2012

Modular IK: a Robust Inverse Kinematic Algorithm for Gesture Imitation in an Upper-Body Humanoid Robot.
Int. J. Humanoid Robotics, 2012

Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features.
IEICE Trans. Inf. Syst., 2012

Foreword.
IEICE Trans. Inf. Syst., 2012

Selective Gammatone Envelope Feature for Robust Sound Event Recognition.
IEICE Trans. Inf. Syst., 2012

Gesture Recognition Based on Localist Attractor Networks with Application to Robot Control [Application Notes].
IEEE Comput. Intell. Mag., 2012

Integration of language identification into a recognition system for spoken conversations containing code-Switches.
Proceedings of the Third Workshop on Spoken Language Technologies for Under-resourced Languages, 2012

Bhattacharyya-based GMM-SVM system with adaptive relevance factor for pair language recognition.
Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012

Variational Bayes logistic regression as regularized fusion for NIST SRE 2010.
Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012

Efficient Language Model Construction for Spoken Dialog Systems by Inducting Language Resources of Different Languages.
Proceedings of the Natural Interaction with Robots, 2012

Component Pluggable Dialogue Framework and Its Application to Social Robots.
Proceedings of the Natural Interaction with Robots, 2012

An analysis of vector Taylor series model compensation for non-stationary noise in speech recognition.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Phonotactic spoken language recognition: Using diversely adapted acoustic models in parallel phone recognizers.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A study of F0 modelling and generation with lyrics and shape characterization for singing voice synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Context dependant phone mapping for cross-lingual acoustic modeling.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Adaptive control for robot manipulators under ellipsoidal task space constraints.
Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012

Effect of Relevance Factor of Maximum a posteriori Adaptation for GMM-SVM in Speaker and Language Recognition.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

PLDA Modeling in I-Vector and Supervector Space for Speaker Verification.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Acoustic TextTiling for story segmentation of spoken documents.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Lasso environment model combination for robust speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

An acoustic segment modeling approach to query-by-example spoken term detection.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A first speech recognition system for Mandarin-English code-switch conversational speech.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A bootstrapping approach for SLU portability to a new language by inducting unannotated user queries.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Generalized F0 modelling with absolute and relative pitch features for singing voice synthesis.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

I-vectors in the context of phonetically-constrained short utterances for speaker verification.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A Phone Mapping Technique for Acoustic Modeling of Under-Resourced Languages.
Proceedings of the 2012 International Conference on Asian Language Processing, 2012

Vision-based attention estimation and selection for social robot to perform natural interaction in the open world.
Proceedings of the International Conference on Human-Robot Interaction, 2012

A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

PNCC-ivector-SRC based speaker verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Report of NEWS 2012 Machine Transliteration Shared Task.
Proceedings of the 4th Named Entity Workshop, 2012

Whitepaper of NEWS 2012 Shared Task on Machine Transliteration.
Proceedings of the 4th Named Entity Workshop, 2012

Modeling the Translation of Predicate-Argument Structure for SMT.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

Utilizing Dependency Language Models for Graph-based Dependency Parsing Models.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

IRIS: a Chat-oriented Dialogue System based on the Vector Space Model.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 2012

2011
Beat space segmentation and octave scale cepstral feature for sung language recognition in pop music.
ACM Trans. Multim. Comput. Commun. Appl., 2011

Speaker Verification With Feature-Space MAPLR Parameters.
IEEE Trans. Speech Audio Process., 2011

A Maximum-Entropy Segmentation Model for Statistical Machine Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2011

Sound Event Recognition With Probabilistic Distance SVMs.
IEEE Trans. Speech Audio Process., 2011

Using Discrete Probabilities With Bhattacharyya Measure for SVM-Based Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2011

Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions.
IEEE Signal Process. Lett., 2011

Towards an Effective Design of Social Robots.
Int. J. Soc. Robotics, 2011

Error Corrective Fusion of Classifier Scores for Spoken Language Recognition.
IEICE Trans. Inf. Syst., 2011

Information Theoretic Learning: Reny's Entropy and Kernel Perspectives (Principe, J.; 2010) [Book Review].
IEEE Comput. Intell. Mag., 2011

Effective Large Scale Text Retrieval via Learning Risk-Minimization and Dependency-Embedded Model.
Proceedings of the Advances in Multimedia Modeling, 2011

Study on the Relevance Factor of Maximum a Posteriori with GMM for Language Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Feature Normalization Using Structured Full Transforms for Robust Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Target-Aware Lattice Rescoring for Dialect Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speech Modulation Features for Robust Nonnative Speech Accent Detection.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Probabilistic Latent Semantic Analysis for Broadcast News Story Segmentation.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Alternative Frequency Scale Cepstral Coefficient for Robust Sound Event Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Spoken Language Recognition in the Latent Topic Simplex.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Joint Application of Speech and Speaker Recognition for Automation and Security in Smart Home.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speech Indexing Using Semantic Context Inference.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Regularized Logistic Regression Fusion for Speaker Verification.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Image Representation of the Subband Power Distribution for Robust Sound Classification.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Joint Alignment and Artificial Data Generation: An Empirical Study of Pivot-based Machine Transliteration.
Proceedings of the Fifth International Joint Conference on Natural Language Processing, 2011

CLGVSM: Adapting Generalized Vector Space Model to Cross-lingual Document Clustering.
Proceedings of the Fifth International Joint Conference on Natural Language Processing, 2011

Maximum likelihood adaptation of histogram equalization with constraint for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2011

Factored covariance modeling for text-independent speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2011

Classifier subset selection and fusion for speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2011

Score fusion and calibration in multiple language detectors with large performance variation.
Proceedings of the IEEE International Conference on Acoustics, 2011

Jump Function Kolmogorov for overlapping audio event classification.
Proceedings of the IEEE International Conference on Acoustics, 2011

Probabilistic distance SVM with Hellinger-Exponential Kernel for sound event classification.
Proceedings of the IEEE International Conference on Acoustics, 2011

Joint Models for Chinese POS Tagging and Dependency Parsing.
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011

SMT Helps Bitext Dependency Parsing.
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011

A cross-domain adaptation method for sentiment classification using probabilistic latent analysis.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Report of NEWS 2011 Machine Transliteration Shared Task.
Proceedings of the 3rd Named Entities Workshop, 2011

Whitepaper of NEWS 2011 Shared Task on Machine Transliteration.
Proceedings of the 3rd Named Entities Workshop, 2011

Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011

AM-FM: A Semantic Framework for Translation Quality Assessment.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, 2011

2010
Word level automatic alignment of music and lyrics using vocal synthesis.
ACM Trans. Multim. Comput. Commun. Appl., 2010

Statistical lattice-based spoken document retrieval.
ACM Trans. Inf. Syst., 2010

A discrete-time neural network for optimization problems with hybrid constraints.
IEEE Trans. Neural Networks, 2010

GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition.
IEEE Trans. Speech Audio Process., 2010

A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition.
IEEE Trans. Speech Audio Process., 2010

TechWare: Speaker and Spoken Language Recognition Resources [Best of the Web].
IEEE Signal Process. Mag., 2010

An overview of text-independent speaker recognition: From features to supervectors.
Speech Commun., 2010

A tree-construction search approach for multivariate time series motifs discovery.
Pattern Recognit. Lett., 2010

Memory Dynamics in Attractor Networks with Saliency Weights.
Neural Comput., 2010

Feature Integration and Dimension Reduction in Unit Selection TTS.
Int. J. Asian Lang. Process., 2010

Linguistically Annotated Reordering: Evaluation and Analysis.
Comput. Linguistics, 2010

Considering readability in text-to-speech recording script design.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Nonlinear Control of a Robot Manipulator with Time-Varying Uncertainties.
Proceedings of the Social Robotics - Second International Conference on Social Robotics, 2010

Autonomous acoustic model adaptation for multilingual meeting transcription involving high- and low-resourced languages.
Proceedings of the 2nd Workshop on Spoken Language Technologies for Under-Resourced Languages, 2010

BISTRA: Malay-English bidirectional speech translation.
Proceedings of the 2nd Workshop on Spoken Language Technologies for Under-Resourced Languages, 2010

Using design methodology to enhance interaction for a robotic receptionist.
Proceedings of the 19th IEEE International Conference on Robot and Human Interactive Communication, 2010

Detection target dependent score calibration for language recognition.
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Parallel Acoustic Model Adaptation for Improving Phonotactic Language Recognition.
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Learning Translation Boundaries for Phrase-Based Decoding.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2010

I<sup>2</sup>r's machine translation system for IWSLT 2010.
Proceedings of the 2010 International Workshop on Spoken Language Translation, 2010

Factor analysis based spatial correlation modeling for speaker verification.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Frame selection of interview channel for NIST speaker recognition evaluation.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

UBM data selection for effective speaker modeling.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Aligning singing voice with MIDI melody using synthesized audio signal.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

The psychoacoustic approach towards enhancing speech intelligibility in noise.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Generating emotional speech from neutral speech.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Building topic mixture language models using the document soft classification notion of topic models.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

MAP estimation of subspace transform for speaker recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A hybrid modeling strategy for GMM-SVM speaker recognition with adaptive relevance factor.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Text-independent F0 transformation with non-parallel data for voice conversion.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Phoneme lattice based texttiling towards multilingual story segmentation.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The estimation and kernel metric of spectral correlation for text-independent speaker verification.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Selecting phonotactic features for language recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The IIR NIST SRE 2008 and 2010 summed channel speaker recognition systems.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Speaker diarization in meeting audio for single distant microphone.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Towards long-range prosodic attribute modeling for language recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

SEAME: a Mandarin-English code-switching speech corpus in south-east asia.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Incorporating MAP estimation and covariance transform for SVM based speaker recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Selective gammatone filterbank feature for robust sound event recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Speaker characterization using long-term and temporal information.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Approaching human listener accuracy with modern speaker verification.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Phonetic segmentation of singing voice using MIDI and parallel speech.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A discriminative performance metric for GMM-UBM speaker identification.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Adaptive admittance control of a robot manipulator under task space constraint.
Proceedings of the IEEE International Conference on Robotics and Automation, 2010

Framewise Phone Classification Using Weighted Fuzzy Classification Rules.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Voice conversion: From spoken vowels to singing vowels.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

Soft margin estimation of Gaussian mixture model parameters for spoken language recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

An acoustic segment model approach to incorporating temporal information into speaker modeling for text-independent speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Speaker diarization system for RT07 and RT09 meeting room audio.
Proceedings of the IEEE International Conference on Acoustics, 2010

Prosodic attribute model for spoken language identification.
Proceedings of the IEEE International Conference on Acoustics, 2010

Tuning phone decoders for language identification.
Proceedings of the IEEE International Conference on Acoustics, 2010

Error corrective classifier fusion for spoken Language Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Feature integration for heart sound biometrics.
Proceedings of the IEEE International Conference on Acoustics, 2010

Semi-supervised learning of language model using unsupervised topic model.
Proceedings of the IEEE International Conference on Acoustics, 2010

A GMM-supervector approach to language recognition with adaptive relevance factor.
Proceedings of the 18th European Signal Processing Conference, 2010

Discrete expected likelihood kernel for SVM-based speaker verification.
Proceedings of the 18th European Signal Processing Conference, 2010

Non-Isomorphic Forest Pair Translation.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010

How humans behave and evaluate a social robot in real-environment settings.
Proceedings of the ECCE 2010, 2010

Machine Transliteration: Leveraging on Third Languages.
Proceedings of the COLING 2010, 2010

Improving Name Origin Recognition with Context Features and Unlabelled Data.
Proceedings of the COLING 2010, 2010

EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora.
Proceedings of the COLING 2010, 2010

I2R Text-to-Speech System for Blizzard Challenge 2010.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

Whitepaper of NEWS 2010 Shared Task on Transliteration Generation.
Proceedings of the 2010 Named Entities Workshop, 2010

Report of NEWS 2010 Transliteration Generation Shared Task.
Proceedings of the 2010 Named Entities Workshop, 2010

Whitepaper of NEWS 2010 Shared Task on Transliteration Mining.
Proceedings of the 2010 Named Entities Workshop, 2010

Report of NEWS 2010 Transliteration Mining Shared Task.
Proceedings of the 2010 Named Entities Workshop, 2010

Convolution Kernel over Packed Parse Forest.
Proceedings of the ACL 2010, 2010

Error Detection for Statistical Machine Translation Using Linguistic Features.
Proceedings of the ACL 2010, 2010

Pseudo-Word for Phrase-Based Machine Translation.
Proceedings of the ACL 2010, 2010

2009
Jump function Kolmogorov for audio classification in noise-mismatch conditions.
IEEE Trans. Signal Process., 2009

A Target-Oriented Phonotactic Front-End for Spoken Language Recognition.
IEEE Trans. Speech Audio Process., 2009

Introduction to the Special Issue on Recent Advances in Asian Language Spoken Document Retrieval.
ACM Trans. Asian Lang. Inf. Process., 2009

An SVM Kernel With GMM-Supervector Based on the Bhattacharyya Distance for Speaker Recognition.
IEEE Signal Process. Lett., 2009

Analysis and Selection of Prosodic Features for Asian Language Recognition.
Int. J. Asian Lang. Process., 2009

Speaker Characterization using Average Filtering and Two Space Fusions.
Int. J. Asian Lang. Process., 2009

Readability Consideration in Speech Synthesis Recording Script Selection.
Int. J. Asian Lang. Process., 2009

A life-size robotic lion dance system with integrated motion control.
Proceedings of the 18th IEEE International Symposium on Robot and Human Interactive Communication, 2009

A Source Dependency Model for Statistical Machine Translation.
Proceedings of Machine Translation Summit XII: Posters, 2009

Efficient Beam Thresholding for Statistical Machine Translation.
Proceedings of Machine Translation Summit XII: Posters, 2009

Automated detection of kinks from blood vessels for optic cup segmentation in retinal images.
Proceedings of the Medical Imaging 2009: Computer-Aided Diagnosis, 2009

ARGALI: an automatic cup-to-disc ratio measurement system for glaucoma detection and AnaLysIs framework.
Proceedings of the Medical Imaging 2009: Computer-Aided Diagnosis, 2009

I2R's machine translation system for IWSLT 2009.
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2009, 2009

I<sup>2</sup>r's machine translation system for IWSLT 2009.
Proceedings of the 2009 International Workshop on Spoken Language Translation, 2009

Large margin estimation of Gaussian mixture model parameters with extended baum-welch for spoken language recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Target-aware language models for spoken language recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Speaker diarization for meeting room audio.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Stream-based context-sensitive phone mapping for cross-lingual speech recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Discriminative feature transformation using output coding for speech recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Unit selection based speech synthesis for poor channel condition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Efficient sparse self-similarity matrix construction for repeating sequence detection.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Acoustic segment modeling for speaker recognition.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Harvesting Regional Transliteration Variants with Guided Search.
Proceedings of the Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy, 2009

Joint map adaptation of feature transformation and Gaussian Mixture Model for speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

A GMM supervector Kernel with the Bhattacharyya distance for SVM based speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

Cross-validation of multiple language recognition systems using pseudo keys.
Proceedings of the IEEE International Conference on Acoustics, 2009

Speaker diarization in meeting audio.
Proceedings of the IEEE International Conference on Acoustics, 2009

Evaluation of a fused FM and cepstral-based speaker recognition system on the NIST 2008 SRE.
Proceedings of the IEEE International Conference on Acoustics, 2009

Cluster criterion functions in spectral subspace and their application in speaker clustering.
Proceedings of the IEEE International Conference on Acoustics, 2009

Exploiting prosodic information for Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009


Sound event classification based on Feature Integration, Recursive Feature Elimination and Structured Classification.
Proceedings of the IEEE International Conference on Acoustics, 2009

Analysis and Selection of Prosodic Features for Language Identification.
Proceedings of the 2009 International Conference on Asian Language Processing, 2009

A Lattice-Based Phonotactic Language Recognition System with CMLLR Adaptation and Its Implementation Issues.
Proceedings of the 2009 International Conference on Asian Language Processing, 2009

Refining Unit Boundaries for Mandarin Text-to-Speech Database.
Proceedings of the 2009 International Conference on Asian Language Processing, 2009

Semi-supervised Learning of Domain-Specific Language Models from General Domain Data.
Proceedings of the 2009 International Conference on Asian Language Processing, 2009

An Interactive Robot Butler.
Proceedings of the Human-Computer Interaction. Novel Interaction Methods and Techniques, 2009

Experiences with a Barista Robot, FusionBot.
Proceedings of the Progress in Robotics, 2009

K-Best Combination of Syntactic Parsers.
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009

Fast Translation Rule Matching for Syntax-based Statistical Machine Translation.
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009

Tree Kernel-based SVM with Structured Syntactic Knowledge for BTG-based Phrase Reordering.
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009

Nonlinear control approaches for SI engine model with uncertainties.
Proceedings of the 48th IEEE Conference on Decision and Control, 2009

I2R Text-to-Speech System for Blizzard Challenge 2009.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009

A study on hidden Markov model's generalization capability for speech recognition.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

The Asian network-based speech-to-speech translation system.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Whitepaper of NEWS 2009 Machine Transliteration Shared Task.
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, 2009

Report of NEWS 2009 Machine Transliteration Shared Task.
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, 2009

Forest-based Tree Sequence to String Translation Model.
Proceedings of the ACL 2009, 2009

A Syntax-Driven Bracketing Model for Phrase-Based Translation.
Proceedings of the ACL 2009, 2009

Topological Ordering of Function Words in Hierarchical Phrase-based Translation.
Proceedings of the ACL 2009, 2009

Transliteration Alignment.
Proceedings of the ACL 2009, 2009

MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval.
Proceedings of the ACL 2009, 2009

A Comparative Study of Hypothesis Alignment and its Improvement for Machine Translation System Combination.
Proceedings of the ACL 2009, 2009

2008
Optimizing the Performance of Spoken Language Recognition With Discriminative Training.
IEEE Trans. Speech Audio Process., 2008

Normalization of the Speech Modulation Spectra for Robust Speech Recognition.
IEEE Trans. Speech Audio Process., 2008

On Acoustic Diversification Front-End for Spoken Language Identification.
IEEE Trans. Speech Audio Process., 2008

Active learning for constructing transliteration lexicons from the Web.
J. Assoc. Inf. Sci. Technol., 2008

Mining Live Transliterations Using Incremental Learning Algorithms.
Int. J. Comput. Process. Orient. Lang., 2008

Guest Editors' Introduction.
Int. J. Comput. Process. Orient. Lang., 2008

A lattice-based approach to query-by-example spoken document retrieval.
Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008

NIST 2007 Language Recognition Evaluation: From the Perspective of IIR.
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, 2008

Dimension reduction of the modulation spectrogram for speaker verification.
Proceedings of the Odyssey 2008: The Speaker and Language Recognition Workshop, 2008

Effectiveness of Signal Segmentation for Music Content Representation.
Proceedings of the Advances in Multimedia Modeling, 2008

The TALP&I2r SMT systems for IWSLT 2008.
Proceedings of the 2008 International Workshop on Spoken Language Translation, 2008

I<sup>2</sup>r multi-pass machine translation system for IWSLT 2008.
Proceedings of the 2008 International Workshop on Spoken Language Translation, 2008

Self-Organized Clustering for Feature Mapping in Language Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Effect of Feature Smoothing for Robust Speech Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

An Efficient Feature Selection Method for Speaker Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Using Pseudo-Key for Language Recognition System Design.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Predicting Spectral and Prosodic Parameters for Unit Selection in Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Discriminative Output Coding Features for Speech Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

PLSA Based Topic Mixture Language Modeling Approach.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Using MAP estimation of feature transformation for speaker recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Target-oriented phone selection from universal phone set for spoken language recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Context-sensitive probabilistic phone mapping model for cross-lingual speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Multi-speaker meeting audio segmentation.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

T-test distance and clustering criterion for speaker diarization.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Rhythm based music segmentation and octave scale cepstral features for sung language recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Characterizing speech utterances for speaker verification with sequence kernel SVM.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Speech/laughter classification in meeting audio.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Robust speaker verification using short-time frequency with long-time window and fusion of multi-resolutions.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Speaker identification in noise mismatch conditions based on jump function Kolmogorov analysis in wavelet domain.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Name Origin Recognition Using Maximum Entropy Model and Diverse Features.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008

Mining Transliterations from Web Query Results: An Incremental Approach.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008

Multi-View Co-Training of Transliteration Model.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008

Fuzzy rule selection using Iterative Rule Learning for speech data classification.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Speech enhancement for telephony name speech recognition.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Unsupervised pronunciation grammar growing using knowledge-based and data-driven approaches.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Discriminative learning for optimizing detection performance in spoken language recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

Target-oriented phone tokenizers for spoken language recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

Robust phone set mapping using decision tree clustering for cross-lingual phone recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

On fusion of timbre-motivated features for singing voice detection and singer identification.
Proceedings of the IEEE International Conference on Acoustics, 2008

Spoken Language recognition using support vector machines with generative front-end.
Proceedings of the IEEE International Conference on Acoustics, 2008

Singing voice detection in pop songs using co-training algorithm.
Proceedings of the IEEE International Conference on Acoustics, 2008

Jump function komogorov and its application for audio stream segmentation and classification.
Proceedings of the IEEE International Conference on Acoustics, 2008

Grammar Comparison Study for Translational Equivalence Modeling and Statistical Machine Translation.
Proceedings of the COLING 2008, 2008

Linguistically Annotated BTG for Statistical Machine Translation.
Proceedings of the COLING 2008, 2008

Regenerating Hypotheses for Statistical Machine Translation.
Proceedings of the COLING 2008, 2008

I2R's Submission to Blizzard Challenge 2008.
Proceedings of the Blizzard Challenge 2008, 2008

Comparative Study of Several Novel Acoustic Features for Speaker Recognition.
Proceedings of the First International Conference on Biomedical Electronics and Devices, 2008

A Tree Sequence Alignment-based Tree-to-Tree Translation Model.
Proceedings of the ACL 2008, 2008

A Linguistically Annotated Reordering Model for BTG-based Statistical Machine Translation.
Proceedings of the ACL 2008, 2008

Exploiting N-best Hypotheses for SMT Self-Enhancement.
Proceedings of the ACL 2008, 2008

2007
Exploring Vibrato-Motivated Acoustic Features for Singer Identification.
IEEE Trans. Speech Audio Process., 2007

Spoken Language Recognition Using Ensemble Classifiers.
IEEE Trans. Speech Audio Process., 2007

A Vector Space Modeling Approach to Spoken Language Identification.
IEEE Trans. Speech Audio Process., 2007

A phonetic similarity model for automatic extraction of transliteration pairs.
ACM Trans. Asian Lang. Inf. Process., 2007

Temporal Structure Normalization of Speech Feature for Robust Speech Recognition.
IEEE Signal Process. Lett., 2007

Evaluating Prosody of Mandarin Speech for Language Learning.
J. Chin. Lang. Comput., 2007

Singing voice detection using perceptually-motivated features.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Evaluating the temporal structure normalisation technique on the Aurora-4 task.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Fusion of contrastive acoustic models for parallel phonotactic spoken language identification.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

A GMM-based probabilistic sequence kernel for speaker verification.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Using direction of arrival estimate and acoustic feature information in speaker diarization.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

A Vector-Based Approach to Broadcast Audio Database Indexing and Retrieval.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

On Timbre Based perceptual Feature for Singer identification.
Proceedings of the 2007 International Computer Music Conference, 2007

A Generalized Feature Transformation Approach for Channel Robust Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2007

Normalizing the Speech Modulation Spectrum for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

Spoken Language Recognition with Relevance Feedback.
Proceedings of the IEEE International Conference on Acoustics, 2007

Discriminative Vector for Spoken Language Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

A Statistical Language Modeling Approach to Lattice-Based Spoken Document Retrieval.
Proceedings of the EMNLP-CoNLL 2007, 2007

Exploring Perceptual Based Timbre Feature for Singer Identification.
Proceedings of the Computer Music Modeling and Retrieval. Sense of Sounds, 2007

Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation.
Proceedings of the Multimodal Technologies for Perception of Humans, 2007

Ordering Phrases with Function Words.
Proceedings of the ACL 2007, 2007

Semantic Transliteration of Personal Names.
Proceedings of the ACL 2007, 2007

2006
A Unit Selection-based Speech Synthesis Approach for Mandarin Chinese.
J. Chin. Lang. Comput., 2006

A Comparative Study of Four Language Identification Systems.
Int. J. Comput. Linguistics Chin. Lang. Process., 2006

Music structure based vector space retrieval.
Proceedings of the SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006

Language Recognition Based on Score Distribution Feature Vectors and Discriminative Classifier Fusion.
Proceedings of the Odyssey 2006: The Speaker and Language Recognition Workshop, 2006

Syllabic level automatic synchronization of music signals and text lyrics.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

Minimum Classification Error Based Optimal Linear Combination for Spoken Language Identification.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

Vector Autoregressive Model for Missing Feature Reconstruction.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Fusion of Acoustic and Tokenization Features for Speaker Recognition.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

The IIR Submission to CSLP 2006 Speaker Recognition Evaluation.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Temporal Discrete Cosine Transform: Towards Longer Term Temporal Features for Speaker Verification.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

Meeting Segmentation Using Two-Layer Cascaded Subband Filters.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Analysis and detection of speech under sleep deprivation.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Speaker cluster based GMM tokenization for speaker recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Vector-based spoken language recognition using output coding.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

A Hierarchical Approach for Music Chord Modeling Based on the Analysis of Tonal Characteristics.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Integrating Acoustic, Prosodic and Phonotactic Features for Spoken Language Identification.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Vibrato-Motivated Acoustic Features for Singger Identification.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Bayesian Learning of N-Gram Statistical Language Modeling.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Learning Transliteration Lexicons from the Web.
Proceedings of the ACL 2006, 2006

2005
A phonotactic-semantic paradigm for automatic spoken document classification.
Proceedings of the SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005

Learning Phrase Translation using Level of Detail Approach.
Proceedings of Machine Translation Summit X: Papers, 2005

Complexity analysis of normal and deaf infant cry acoustic waves.
Proceedings of the Fourth International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, 2005

Identifying singers of popular songs.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

An acoustic segment modeling approach to automatic language identification.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Multilingual speech recognition: a unified approach.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

A text categorization approach to automatic language identification.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

A probabilistic approach to prosodic word prediction for Mandarin Chinese TTS.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

A Phrase-Based Context-Dependent Joint Probability Model for Named Entity Translation.
Proceedings of the Natural Language Processing, 2005

Phrase-Based Statistical Machine Translation: A Level of Detail Approach.
Proceedings of the Natural Language Processing, 2005

Broadcast news segmentation by audio type analysis.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Using Local & Global Phonotactic Features in Chinese Dialect Identification.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

A Phonotactic Language Model for Spoken Language Identification.
Proceedings of the ACL 2005, 2005

2004
Language identification through large vocabulary continuous speech recognition.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Grapheme-to-phoneme conversion for Chinese text-to-speech.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Direct Orthographical Mapping for Machine Transliteration.
Proceedings of the COLING 2004, 2004

A Joint Source-Channel Model for Machine Transliteration.
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004

2003
On unit analysis for Cantonese corpus-based TTS.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002
Equivalent node-based speech grammar optimization.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Likelihood probability mismatch analysis and normalization in multilingual speech applications.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Concatenative Chinese speech synthesis and quality evaluation.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Multilingual speech recognition with language identification.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2000
Semi-class-based N-gram Language Modeling for Chinese Dictation.
Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000

1998
Chinese Word Segmentation.
Proceedings of the 12th Pacific Asia Conference on Language, Information and Computation, 1998

Optimization of Parameter Tying for Chinese Acoustic Modeling.
Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing, 1998

Data-driven Acoustic Modeling Approach for Chinese LVCSR.
Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing, 1998

Chinese Sentence Tokenization Using Viterbi Decoder.
Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing, 1998

Building class-based language models with contextual statistics.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1996
Speaker time-drifting adaptation using trajectory mixture hidden Markov models.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

Probabilistic mapping networks for speaker recognition.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

1995
Some nonparametric distance measures in speaker verification.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

Speaker recognition with temporal transition models.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

On MMI learning of Gaussian mixture for speaker models.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

1993
Structured Specifications, Semantics, and System Semantics.
Proceedings of the SEKE'93, 1993


  Loading...