Ming Li

Orcid: 0000-0002-6406-1983

Affiliations:
  • Duke Kunshan University, Data Science Research Center, China
  • Sun Yat-Sen University Carnegie Mellon University Joint Institute of Engineering, China (former)
  • University of Southern California, Los Angeles, CA, USA (former)
  • Chinese Academy of Sciences, Institute of Acoustics, China (former)


According to our database1, Ming Li authored at least 197 papers between 2000 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Integrating frame-level boundary detection and deepfake detection for locating manipulated regions in partially spoofed audio forgery attacks.
Comput. Speech Lang., April, 2024

HSVRS: A Virtual Reality System of the Hide-and-Seek Game to Enhance Gaze Fixation Ability for Autistic Children.
IEEE Trans. Learn. Technol., 2024

Investigating Long-Term and Short-Term Time-Varying Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Leveraging ASR Pretrained Conformers for Speaker Verification Through Transfer Learning and Knowledge Distillation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Joint Training on Multiple Datasets With Inconsistent Labeling Criteria for Facial Expression Recognition.
IEEE Trans. Affect. Comput., 2024

Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning.
CoRR, 2024

Efficient Personal Voice Activity Detection with Wake Word Reference Speech.
Proceedings of the IEEE International Conference on Acoustics, 2024

Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2024

Joint Inference of Speaker Diarization and ASR with Multi-Stage Information Sharing.
Proceedings of the IEEE International Conference on Acoustics, 2024

Voxblink: A Large Scale Speaker Verification Dataset on Camera.
Proceedings of the IEEE International Conference on Acoustics, 2024

Invertible Voice Conversion with Parallel Data.
Proceedings of the IEEE International Conference on Acoustics, 2024

StarRescue: the Design and Evaluation of A Turn-Taking Collaborative Game for Facilitating Autistic Children's Social Skills.
Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024

2023
A Complementary Dual-Branch Network for Appearance-Based Gaze Estimation From Low-Resolution Facial Image.
IEEE Trans. Cogn. Dev. Syst., September, 2023

Accurate Head Pose Estimation Using Image Rectification and a Lightweight Convolutional Neural Network.
IEEE Trans. Multim., 2023

Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Typical Facial Expression Network Using a Facial Feature Decoupler and Spatial-Temporal Learning.
IEEE Trans. Affect. Comput., 2023

Computer-Aided Autism Spectrum Disorder Diagnosis With Behavior Signal Processing.
IEEE Trans. Affect. Comput., 2023

STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition.
IEEE Trans. Affect. Comput., 2023

Cross-lingual multi-speaker speech synthesis with limited bilingual training data.
Comput. Speech Lang., 2023

End-to-end Online Speaker Diarization with Target Speaker Tracking.
CoRR, 2023

SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus.
CoRR, 2023

The DKU-DUKEECE System for the Manipulation Region Location Task of ADD 2023.
CoRR, 2023

VoxBlink: X-Large Speaker Verification Dataset on Camera.
CoRR, 2023

Graph Neural Network-Aided Exploratory Learning for Community Detection with Unknown Topology.
CoRR, 2023

Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion.
Biomed. Signal Process. Control., 2023

Assessing the Social Skills of Children with Autism Spectrum Disorder via Language-Image Pre-training Models.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SEF-Net: Speaker Embedding Free Target Speaker Extraction Network.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Robust Audio Anti-spoofing Countermeasure with Joint Training of Front-end and Back-end Models.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploring Universal Singing Speech Language Identification Using Self-Supervised Learning Based Front-End Features.
Proceedings of the IEEE International Conference on Acoustics, 2023

The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis.
Proceedings of the IEEE International Conference on Acoustics, 2023

Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction.
Proceedings of the IEEE International Conference on Acoustics, 2023

The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

Pretraining Conformer with ASR for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2023

Waveform Boundary Detection for Partially Spoofed Audio.
Proceedings of the IEEE International Conference on Acoustics, 2023

Identifying Source Speakers for Voice Conversion Based Spoofing Attacks on Speaker Verification Systems.
Proceedings of the IEEE International Conference on Acoustics, 2023

From Speaker Verification to Deepfake Algorithm Recognition: Our Learned Lessons from ADD2023 Track 3.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy.
EURASIP J. Audio Speech Music. Process., 2022

The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022.
CoRR, 2022

Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios(V1).
CoRR, 2022

Invertible Voice Conversion.
CoRR, 2022

Generating Adversarial Samples For Training Wake-up Word Detection Systems Against Confusing Words.
CoRR, 2022

Low-Latency Online Speaker Diarization with Graph-Based Label Generation.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Generating TTS Based Adversarial Samples for Training Wake-Up Word Detection Systems Against Confusing Words.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Single-Channel Target Speaker Separation Using Joint Training with Target Speaker's Pitch Information.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Improving Spoofing Capability for End-to-end Any-to-many Voice Conversion.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Low Pass Filtering and Bandwidth Extension for Robust Anti-spoofing Countermeasure Against Codec Variabilities.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The DKU-OPPO System for the 2022 Spoofing-Aware Speaker Verification Challenge.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Online Target Speaker Voice Activity Detection for Speaker Diarization.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Multimodal Framework for Automated Teaching Quality Assessment of One-to-many Online Instruction Videos.
Proceedings of the 26th International Conference on Pattern Recognition, 2022

SIG-VC: A Speaker Information Guided Zero-Shot Voice Conversion System for Both Human Beings and Machines.
Proceedings of the IEEE International Conference on Acoustics, 2022

Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for the M2met Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Incorporating End-to-End Framework Into Target-Speaker Voice Activity Detection.
Proceedings of the IEEE International Conference on Acoustics, 2022

Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection.
Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022

The DKU Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Facial Expression Recognition with Identity and Emotion Joint Learning.
IEEE Trans. Affect. Comput., 2021

Discriminative Dictionary Learning for Autism Spectrum Disorder Identification.
Frontiers Comput. Neurosci., 2021

Online Speaker Diarization with Graph-based Label Generation.
CoRR, 2021

Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification.
CoRR, 2021

The DKU-DukeECE System for the Self-Supervision Speaker Verification Task of the 2021 VoxCeleb Speaker Recognition Challenge.
CoRR, 2021

The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge.
CoRR, 2021

Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication.
CoRR, 2021

Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss.
CoRR, 2021

Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Lexical Information Fusion.
CoRR, 2021

The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge.
CoRR, 2021

Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Binary Neural Network for Speaker Verification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase III.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

AISHELL-3: A Multi-Speaker Mandarin TTS Corpus.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Our Learned Lessons from Cross-Lingual Speaker Verification: The CRMI-DKU System Description for the Short-Duration Speaker Verification Challenge 2021.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Multimodal Dynamic Neural Network for Call for Help Recognition in Elevators.
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021

Call For Help Detection In Emergent Situations Using Keyword Spotting And Paralinguistic Analysis.
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021

Cross-modal Assisted Training for Abnormal Event Recognition in Elevators.
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

An Iterative Framework for Self-Supervised Deep Speaker Representation Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification.
CoRR, 2020

Training Wake Word Detection with Synthesized Speech Data on Confusion Words.
CoRR, 2020

AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines.
CoRR, 2020

Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling.
CoRR, 2020

Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario.
CoRR, 2020

Mutli-task Learning with Alignment Loss for Far-field Small-Footprint Keyword Spotting.
CoRR, 2020

The FFSVC 2020 Evaluation Plan.
CoRR, 2020

Optimal Mapping Loss: A Faster Loss for End-to-End Speaker Diarization.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Attentive Similarity Measurement Strategies in Speaker Diarization.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Atss-Net: Target Speaker Separation via Attention-Based Neural Network.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Responsive Social Smile: A Machine Learning based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

RWF-2000: An Open Large Scale Video Database for Violence Detection.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

HI-MIA: A Far-Field Text-Dependent Speaker Verification Database and the Baselines.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Within-Sample Variability-Invariant Loss for Robust Speaker Recognition Under Noisy Environments.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The Duke Entry for 2020 Blizzard Challenge.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019
String Stability Analysis for Vehicle Platooning Under Unreliable Communication Links With Event-Triggered Strategy.
IEEE Trans. Veh. Technol., 2019

An automated assessment framework for atypical prosody and stereotyped idiosyncratic phrases related to autism spectrum disorder.
Comput. Speech Lang., 2019

The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Far-Field End-to-End Text-Dependent Speaker Verification Based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Survey Talk: End-to-End Deep Neural Network Based Speaker and Language Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-Level Embedding Features.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multi-Channel Training for End-to-End Speaker Recognition Under Reverberant and Noisy Environment.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Fixation Based Object Recognition in Autism Clinic Setting.
Proceedings of the Intelligent Robotics and Applications - 12th International Conference, 2019

F0 Contour Estimation Using Phonetic Feature in Electrolaryngeal Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2019

Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM.
Proceedings of the IEEE International Conference on Acoustics, 2019

The DKU Speech Synthesis System for 2019 Blizzard Challenge.
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

DKU-Tencent Submission to Oriental Language Recognition AP18-OLR Challenge.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Facial Expression Recognition with Identity and Spatial-temporal Integrated Learning.
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, 2019

2018
Cancellable speech template via random binary orthogonal matrices projection hashing.
Pattern Recognit., 2018

Insights into End-to-End Learning Scheme for Language Identification.
CoRR, 2018

Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Unsupervised query by example spoken term detection using features concatenated with Self-Organizing Map distances.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

End-to-end Language Identification using NetFV and NetVLAD.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

The DKU-JNU-EMA Electromagnetic Articulography Database on Mandarin and Chinese Dialects with Tandem Feature based Acoustic-to-Articulatory Inversion.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Analysis of Length Normalization in End-to-End Speaker Verification System.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Insights in-to-End Learning Scheme for Language Identification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
Robust Real-Time Distributed Optimal Control Based Energy Management in a Smart Grid.
IEEE Trans. Smart Grid, 2017

End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

SphereFace: Deep Hypersphere Embedding for Face Recognition.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Response to name: A dataset and a multimodal machine learning framework towards autism study.
Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction, 2017

Automatic emotional spoken language text corpus construction from written dialogs in fictions.
Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction, 2017

2016
Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification.
J. Signal Process. Syst., 2016

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals.
Comput. Speech Lang., 2016

Speaker diarization system for autism children's real-life audio data.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Text-independent voice conversion using deep neural network based phonetic level features.
Proceedings of the 23rd International Conference on Pattern Recognition, 2016

The SYSU System for CCPR 2016 Multimodal Emotion Recognition Challenge.
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

Locality sensitive discriminant analysis for speaker verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

On Order-Constrained Transitive Distance Clustering.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Automatic intelligibility classification of sentence-level pathological speech.
Comput. Speech Lang., 2015

The SYSU System for the Interspeech 2015 Automatic Speaker Verification Spoofing and Countermeasures Challenge.
CoRR, 2015

Speech bandwidth expansion based on deep neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Locality constrained transitive distance clustering on speech data.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Modified-prior PLDA and score calibration for duration mismatch compensation in speaker recognition system.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Duration dependent covariance regularization in PLDA modeling for speaker verification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The SYSU system for the interspeech 2015 automatic speaker verification spoofing and countermeasures challenge.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Efficient autism spectrum disorder prediction with eye movement: A machine learning framework.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014
Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification.
Comput. Speech Lang., 2014

Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors.
Comput. Speech Lang., 2014

An iterative framework for unsupervised learning in the PLDA based speaker verification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Melody Extraction for Vocal Polyphonic Music Based on Bayesian Framework.
Proceedings of the 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2014

Simplified and supervised i-vector modeling for speaker age regression.
Proceedings of the IEEE International Conference on Acoustics, 2014

Verification based ECG biometrics with cardiac irregular conditions using heartbeat level and segment level information fusion.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Automatic speaker age and gender recognition using acoustic and prosodic level information fusion.
Comput. Speech Lang., 2013

Multi-band long-term signal variability features for robust voice activity detection.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Speaker verification based on fusion of acoustic and articulatory information.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

TRAP language identification system for RATS phase II evaluation.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Classifying language-related developmental disorders from speech cues: the promise and the potential confounds.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Speaker verification using simplified and supervised i-vector modeling.
Proceedings of the IEEE International Conference on Acoustics, 2013

Automatic Vocal Segments Detection in Popular Music.
Proceedings of the Ninth International Conference on Computational Intelligence and Security, 2013

2012
KNOWME: An Energy-Efficient Multimodal Body Area Network for Physical Activity Monitoring.
ACM Trans. Embed. Comput. Syst., 2012

KNOWME: a case study in wireless body area sensor network design.
IEEE Commun. Mag., 2012

Intelligibility classification of pathological speech using fusion of multiple high level descriptors.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Speaker verification using Lasso based sparse total variability supervector with PLDA modeling.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
Optimal Time-Resource Allocation for Energy-Efficient Physical Activity Detection.
IEEE Trans. Signal Process., 2011

Speaker Verification Using Sparse Representations on Total Variability i-vectors.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors.
Proceedings of the IEEE International Conference on Acoustics, 2011

Modeling high-level descriptions of real-life physical activities using latent topic modeling of multimodal sensor signals.
Proceedings of the 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2011

2010
Combining five acoustic level modeling methods for automatic speaker age and gender recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Robust ECG Biometrics by Fusing Temporal and Cepstral Information.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

2009
Automatic Singing Performance Evaluation for Untrained Singers.
IEICE Trans. Inf. Syst., 2009

Optimal Allocation of Time-Resources for Multihypothesis Activity-Level Detection.
Proceedings of the Distributed Computing in Sensor Systems, 2009

Optimal time-resource allocation for activity-detection via multimodal sensing.
Proceedings of the 4th International ICST Conference on Body Area Networks, 2009

2008
Melody Track Selection Using Discriminative Language Model.
IEICE Trans. Inf. Syst., 2008

Automatic Language Identification with Discriminative Language Characterization Based on SVM.
IEICE Trans. Inf. Syst., 2008

Using SVM as Back-End Classifier for Language Identification.
EURASIP J. Audio Speech Music. Process., 2008

Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

An objective singing evaluation approach by relating acoustic measurements to perceptual ratings.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

2007
Singing Melody Extraction in Polyphonic Music by Harmonic Tracking.
Proceedings of the 8th International Conference on Music Information Retrieval, 2007

Spoken language identification using score vector modeling and support vector machine.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Authentication and Quality Monitoring based on Audio Watermark for Analog AM Shortwave Broadcasting.
Proceedings of the 3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007), 2007

The Design of Backend Classifiers in PPRLM System for Language Identification.
Proceedings of the Third International Conference on Natural Computation, 2007

2006
A Top-down Approach to Melody Match in Pitch Contour for Query by Humming.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

An Efficient and Robust Approach to Audio ID Identification.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

A Novel Audio Watermarking in Wavelet Domain.
Proceedings of the Second International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2006), 2006

2000
Multi-group mixture weight HMM.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000


  Loading...