Ming Li
Orcid: 0000-0002-6406-1983Affiliations:
- Duke Kunshan University, Data Science Research Center, China
- Sun Yat-Sen University Carnegie Mellon University Joint Institute of Engineering, China (former)
- University of Southern California, Los Angeles, CA, USA (former)
- Chinese Academy of Sciences, Institute of Acoustics, China (former)
According to our database1,
Ming Li
authored at least 197 papers
between 2000 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Integrating frame-level boundary detection and deepfake detection for locating manipulated regions in partially spoofed audio forgery attacks.
Comput. Speech Lang., April, 2024
HSVRS: A Virtual Reality System of the Hide-and-Seek Game to Enhance Gaze Fixation Ability for Autistic Children.
IEEE Trans. Learn. Technol., 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Leveraging ASR Pretrained Conformers for Speaker Verification Through Transfer Learning and Knowledge Distillation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Joint Training on Multiple Datasets With Inconsistent Labeling Criteria for Facial Expression Recognition.
IEEE Trans. Affect. Comput., 2024
Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning.
CoRR, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
StarRescue: the Design and Evaluation of A Turn-Taking Collaborative Game for Facilitating Autistic Children's Social Skills.
Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024
2023
A Complementary Dual-Branch Network for Appearance-Based Gaze Estimation From Low-Resolution Facial Image.
IEEE Trans. Cogn. Dev. Syst., September, 2023
Accurate Head Pose Estimation Using Image Rectification and a Lightweight Convolutional Neural Network.
IEEE Trans. Multim., 2023
Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Typical Facial Expression Network Using a Facial Feature Decoupler and Spatial-Temporal Learning.
IEEE Trans. Affect. Comput., 2023
IEEE Trans. Affect. Comput., 2023
STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition.
IEEE Trans. Affect. Comput., 2023
Comput. Speech Lang., 2023
CoRR, 2023
Graph Neural Network-Aided Exploratory Learning for Community Detection with Unknown Topology.
CoRR, 2023
Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion.
Biomed. Signal Process. Control., 2023
Assessing the Social Skills of Children with Autism Spectrum Disorder via Language-Image Pre-training Models.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023
Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Robust Audio Anti-spoofing Countermeasure with Joint Training of Front-end and Back-end Models.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Exploring Universal Singing Speech Language Identification Using Self-Supervised Learning Based Front-End Features.
Proceedings of the IEEE International Conference on Acoustics, 2023
The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Identifying Source Speakers for Voice Conversion Based Spoofing Attacks on Speaker Verification Systems.
Proceedings of the IEEE International Conference on Acoustics, 2023
From Speaker Verification to Deepfake Algorithm Recognition: Our Learned Lessons from ADD2023 Track 3.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy.
EURASIP J. Audio Speech Music. Process., 2022
CoRR, 2022
Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios(V1).
CoRR, 2022
Generating Adversarial Samples For Training Wake-up Word Detection Systems Against Confusing Words.
CoRR, 2022
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Generating TTS Based Adversarial Samples for Training Wake-Up Word Detection Systems Against Confusing Words.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Single-Channel Target Speaker Separation Using Joint Training with Target Speaker's Pitch Information.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022
Low Pass Filtering and Bandwidth Extension for Robust Anti-spoofing Countermeasure Against Codec Variabilities.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
A Multimodal Framework for Automated Teaching Quality Assessment of One-to-many Online Instruction Videos.
Proceedings of the 26th International Conference on Pattern Recognition, 2022
SIG-VC: A Speaker Information Guided Zero-Shot Voice Conversion System for Both Human Beings and Machines.
Proceedings of the IEEE International Conference on Acoustics, 2022
Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for the M2met Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection.
Proceedings of the IEEE International Conference on Acoustics, 2022
Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE Trans. Affect. Comput., 2021
Frontiers Comput. Neurosci., 2021
Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification.
CoRR, 2021
The DKU-DukeECE System for the Self-Supervision Speaker Verification Task of the 2021 VoxCeleb Speaker Recognition Challenge.
CoRR, 2021
The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge.
CoRR, 2021
CoRR, 2021
Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss.
CoRR, 2021
Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Lexical Information Fusion.
CoRR, 2021
The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge.
CoRR, 2021
Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Our Learned Lessons from Cross-Lingual Speaker Verification: The CRMI-DKU System Description for the Short-Duration Speaker Verification Challenge 2021.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021
Call For Help Detection In Emergent Situations Using Keyword Spotting And Paralinguistic Analysis.
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
2020
On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification.
CoRR, 2020
CoRR, 2020
Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling.
CoRR, 2020
Mutli-task Learning with Alignment Loss for Far-field Small-Footprint Keyword Spotting.
CoRR, 2020
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Responsive Social Smile: A Machine Learning based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening.
Proceedings of the 25th International Conference on Pattern Recognition, 2020
Proceedings of the 25th International Conference on Pattern Recognition, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Within-Sample Variability-Invariant Loss for Robust Speaker Recognition Under Noisy Environments.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
2019
String Stability Analysis for Vehicle Platooning Under Unreliable Communication Links With Event-Triggered Strategy.
IEEE Trans. Veh. Technol., 2019
An automated assessment framework for atypical prosody and stereotyped idiosyncratic phrases related to autism spectrum disorder.
Comput. Speech Lang., 2019
The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Far-Field End-to-End Text-Dependent Speaker Verification Based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-Level Embedding Features.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Multi-Channel Training for End-to-End Speaker Recognition Under Reverberant and Noisy Environment.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the Intelligent Robotics and Applications - 12th International Conference, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Facial Expression Recognition with Identity and Spatial-temporal Integrated Learning.
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, 2019
2018
Cancellable speech template via random binary orthogonal matrices projection hashing.
Pattern Recognit., 2018
Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018
Unsupervised query by example spoken term detection using features concatenated with Self-Organizing Map distances.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
The DKU-JNU-EMA Electromagnetic Articulography Database on Mandarin and Chinese Dialects with Tandem Feature based Acoustic-to-Articulatory Inversion.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
2017
Robust Real-Time Distributed Optimal Control Based Energy Management in a Smart Grid.
IEEE Trans. Smart Grid, 2017
End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Response to name: A dataset and a multimodal machine learning framework towards autism study.
Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction, 2017
Automatic emotional spoken language text corpus construction from written dialogs in fictions.
Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction, 2017
2016
Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification.
J. Signal Process. Syst., 2016
Speaker verification based on the fusion of speech acoustics and inverted articulatory signals.
Comput. Speech Lang., 2016
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Text-independent voice conversion using deep neural network based phonetic level features.
Proceedings of the 23rd International Conference on Pattern Recognition, 2016
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016
2015
Comput. Speech Lang., 2015
The SYSU System for the Interspeech 2015 Automatic Speaker Verification Spoofing and Countermeasures Challenge.
CoRR, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Modified-prior PLDA and score calibration for duration mismatch compensation in speaker recognition system.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Duration dependent covariance regularization in PLDA modeling for speaker verification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
The SYSU system for the interspeech 2015 automatic speaker verification spoofing and countermeasures challenge.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
Efficient autism spectrum disorder prediction with eye movement: A machine learning framework.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015
2014
Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification.
Comput. Speech Lang., 2014
Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors.
Comput. Speech Lang., 2014
An iterative framework for unsupervised learning in the PLDA based speaker verification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Verification based ECG biometrics with cardiac irregular conditions using heartbeat level and segment level information fusion.
Proceedings of the IEEE International Conference on Acoustics, 2014
2013
Automatic speaker age and gender recognition using acoustic and prosodic level information fusion.
Comput. Speech Lang., 2013
Multi-band long-term signal variability features for robust voice activity detection.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Classifying language-related developmental disorders from speech cues: the promise and the potential confounds.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the Ninth International Conference on Computational Intelligence and Security, 2013
2012
KNOWME: An Energy-Efficient Multimodal Body Area Network for Physical Activity Monitoring.
ACM Trans. Embed. Comput. Syst., 2012
IEEE Commun. Mag., 2012
Intelligibility classification of pathological speech using fusion of multiple high level descriptors.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Speaker verification using Lasso based sparse total variability supervector with PLDA modeling.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012
2011
IEEE Trans. Signal Process., 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors.
Proceedings of the IEEE International Conference on Acoustics, 2011
Modeling high-level descriptions of real-life physical activities using latent topic modeling of multimodal sensor signals.
Proceedings of the 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2011
2010
Combining five acoustic level modeling methods for automatic speaker age and gender recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 20th International Conference on Pattern Recognition, 2010
2009
IEICE Trans. Inf. Syst., 2009
Proceedings of the Distributed Computing in Sensor Systems, 2009
Proceedings of the 4th International ICST Conference on Body Area Networks, 2009
2008
IEICE Trans. Inf. Syst., 2008
Automatic Language Identification with Discriminative Language Characterization Based on SVM.
IEICE Trans. Inf. Syst., 2008
EURASIP J. Audio Speech Music. Process., 2008
Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
An objective singing evaluation approach by relating acoustic measurements to perceptual ratings.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
2007
Proceedings of the 8th International Conference on Music Information Retrieval, 2007
Spoken language identification using score vector modeling and support vector machine.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Authentication and Quality Monitoring based on Audio Watermark for Analog AM Shortwave Broadcasting.
Proceedings of the 3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007), 2007
Proceedings of the Third International Conference on Natural Computation, 2007
2006
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
Proceedings of the Second International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2006), 2006
2000
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000