Zheng-Hua Tan

Orcid: 0000-0001-6856-8928

According to our database1, Zheng-Hua Tan authored at least 238 papers between 2002 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Utilization of acoustic signals with generative Gaussian and autoencoder modeling for condition-based maintenance of injection moulds.
Int. J. Comput. Integr. Manuf., April, 2024

Data-Driven Non-Intrusive Speech Intelligibility Prediction Using Speech Presence Probability.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Investigating the Design Space of Diffusion Models for Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Generating Accurate and Diverse Audio Captions Through Variational Autoencoder Framework.
IEEE Signal Process. Lett., 2024

The Effect of Training Dataset Size on Discriminative and Diffusion-Based Speech Enhancement Systems.
IEEE Signal Process. Lett., 2024

BiSSL: Bilevel Optimization for Self-Supervised Pre-Training and Fine-Tuning.
CoRR, 2024

Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs.
CoRR, 2024

Zero-Shot Audio Captioning Using Soft and Hard Prompts.
CoRR, 2024

Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations.
CoRR, 2024

Noise-Robust Keyword Spotting through Self-supervised Pretraining.
CoRR, 2024

Joint Far- and Near-End Speech and Listening Enhancement With Minimum Processing.
IEEE Access, 2024

Near-End Listening Enhancement Using a Noise-Robust Linear Time-Invariant Filter.
Proceedings of the 18th International Workshop on Acoustic Signal Enhancement, 2024

Complex Recurrent Variational Autoencoder for Speech Resynthesis and Enhancement.
Proceedings of the International Joint Conference on Neural Networks, 2024

PAC-Bayesian Error Bound, via Rényi Divergence, for a Class of Linear Time-Invariant State-Space Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler.
Proceedings of the IEEE International Conference on Acoustics, 2024

Joint Minimum Processing Beamforming and Near-End Listening Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2024

Self-Supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions.
Proceedings of the IEEE International Conference on Acoustics, 2024

Speaker and Style Disentanglement of Speech Based on Contrastive Predictive Coding Supported Factorized Variational Autoencoder.
Proceedings of the 32nd European Signal Processing Conference, 2024

Envelope Based Deep Source Separation and EEG Auditory Attention Decoding for Speech and Music.
Proceedings of the 32nd European Signal Processing Conference, 2024

PAC-Bayes Generalisation Bounds for Dynamical Systems including Stable RNNs.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Data Science Education: The Signal Processing Perspective [SP Education].
IEEE Signal Process. Mag., November, 2023

On the deficiency of intelligibility metrics as proxies for subjective intelligibility.
Speech Commun., May, 2023

On the Comparisons of Decorrelation Approaches for Non-Gaussian Neutral Vector Variables.
IEEE Trans. Neural Networks Learn. Syst., April, 2023

ACTUAL: Audio Captioning With Caption Feature Space Regularization.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Minimum Processing Near-End Listening Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Masked Autoencoders with Multi-Window Attention Are Better Audio Learners.
CoRR, 2023

PAC-Bayesian bounds for learning LTI-ss systems with input from empirical loss.
CoRR, 2023

Explicit construction of the minimum error variance estimator for stochastic LTI-ss systems.
Autom., 2023

Speech inpainting: Context-based speech synthesis guided by video.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Radio Sensing with Large Intelligent Surface for 6G.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Vision-Assisted Hearing Aid System Based on Deep Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improved Disentangled Speech Representations Using Contrastive Learning in Factorized Hierarchical Variational Autoencoder.
Proceedings of the 31st European Signal Processing Conference, 2023

2022
Multichannel Speech Enhancement With Own Voice-Based Interfering Speech Suppression for Hearing Assistive Devices.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Advanced Dropout: A Model-Free Methodology for Bayesian Dropout Optimization.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

PAC-Bayesian-Like Error Bound for a Class of Linear Time-Invariant Stochastic State-Space Models.
CoRR, 2022

Filterbank Learning for Small-Footprint Keyword Spotting Robust to Noise.
CoRR, 2022

Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise.
CoRR, 2022

Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining.
CoRR, 2022

On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification.
CoRR, 2022

Training Data-Driven Speech Intelligibility Predictors on Heterogeneous Listening Test Data.
IEEE Access, 2022

Deep Spoken Keyword Spotting: An Overview.
IEEE Access, 2022

The Minimum Overlap-Gap Algorithm for Speech Enhancement.
IEEE Access, 2022

iVAE-GAN: Identifiable VAE-GAN Models for Latent Representation Learning.
IEEE Access, 2022

AoI and Throughput Optimization for Hybrid Traffic in Cellular Uplink Using Reinforcement Learning.
Proceedings of the 95th IEEE Vehicular Technology Conference, 2022

Floor Map Reconstruction Through Radio Sensing and Learning by a Large Intelligent Surface.
Proceedings of the 32nd IEEE International Workshop on Machine Learning for Signal Processing, 2022

Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Joint Far- and Near-End Speech Intelligibility Enhancement Based on the Approximated Speech Intelligibility Index.
Proceedings of the IEEE International Conference on Acoustics, 2022

An Experimental Study on Light Speech Features for Small-Footprint Keyword Spotting.
Proceedings of the 6th International Conference, 2022

User Localization using RF Sensing: A Performance comparison between LIS and mmWave Radars.
Proceedings of the 30th European Signal Processing Conference, 2022

2021
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Vocal Tract Length Perturbation for Text-Dependent Speaker Verification With Autoregressive Prediction Coding.
IEEE Signal Process. Lett., 2021

Assessing Wireless Sensing Potential With Large Intelligent Surfaces.
IEEE Open J. Commun. Soc., 2021

Deep InterBoost networks for small-sample image classification.
Neurocomputing, 2021

Self-segmentation of pass-phrase utterances for deep feature learning in text-dependent speaker verification.
Comput. Speech Lang., 2021

Design of AoI-Aware 5G Uplink Scheduler UsingReinforcement Learning.
CoRR, 2021

Optimal Prediction of Unmeasured Output from Measurable Outputs In LTI Systems.
CoRR, 2021

Improvement of Noise-Robust Single-Channel Voice Activity Detection with Spatial Pre-processing.
CoRR, 2021

INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing.
CoRR, 2021

On TasNet for Low-Latency Single-Speaker Speech Enhancement.
CoRR, 2021

Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification.
CoRR, 2021

Remote Anomaly Detection in Industry 4.0 Using Resource-Constrained Devices.
Proceedings of the 22nd IEEE International Workshop on Signal Processing Advances in Wireless Communications, 2021

UIAI System for Short-Duration Speaker Verification Challenge 2020.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective.
Proceedings of the 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), 2021

Compression of DNNs Using Magnitude Pruning and Nonlinear Information Bottleneck Training.
Proceedings of the 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), 2021

Audio-Visual Speech Inpainting with Deep Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021

Joint Maximum Likelihood Estimation of Power Spectral Densities and Relative Acoustic Transfer Functions for Acoustic Beamforming.
Proceedings of the IEEE International Conference on Acoustics, 2021

PAC-Bayesian theory for stochastic LTI systems.
Proceedings of the 2021 60th IEEE Conference on Decision and Control (CDC), 2021

Conferencingspeech Challenge: Towards Far-Field Multi-Channel Speech Enhancement for Video Conferencing.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Design of AoI-Aware 5G Uplink Scheduler Using Reinforcement Learning.
Proceedings of the 4th IEEE 5G World Forum, 2021

2020
The Importance of Context When Recommending TV Content: Dataset and Algorithms.
IEEE Trans. Multim., 2020

OSLNet: Deep Small-Sample Classification With an Orthogonal Softmax Layer.
IEEE Trans. Image Process., 2020

Online Multichannel Speech Enhancement Based on Recursive EM and DNN-Based Speech Presence Estimation.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Improved External Speaker-Robust Keyword Spotting for Hearing Assistive Devices.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Highlights From the Machine Learning for Signal Processing Technical Committee [In the Spotlight].
IEEE Signal Process. Mag., 2020

rVAD: An unsupervised segment-based robust voice activity detection method.
Comput. Speech Lang., 2020

Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization.
CoRR, 2020

Data augmentation enhanced speaker enrollment for text-dependent speaker verification.
CoRR, 2020

On Bottleneck Features for Text-Dependent Speaker Verification Using X-vectors.
CoRR, 2020

Context-Aware Recommendations for Televisions Using Deep Embeddings with Relaxed N-Pairs Loss Objective.
CoRR, 2020

Vocoder-Based Speech Synthesis from Silent Videos.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

CC-Loss: Channel Correlation Loss for Image Classification.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Adversarial Example Detection by Classification for Deep Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Maximum Likelihood Estimation of the Interference-Plus-Noise Cross Power Spectral Density Matrix for Own Voice Retrieval.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Exploring Filterbank Learning for Keyword Spotting.
Proceedings of the 28th European Signal Processing Conference, 2020

A Primer on Large Intelligent Surface (LIS) for Wireless Sensing in an Industrial Setting.
Proceedings of the Cognitive Radio-Oriented Wireless Networks, 2020

2019
Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

On the Relationship Between Short-Time Objective Intelligibility and Short-Time Spectral-Amplitude Mean-Square Error for Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Deep-learning-based audio-visual speech enhancement in presence of Lombard effect.
Speech Commun., 2019

Deep Joint Embeddings of Context and Content for Recommendation.
CoRR, 2019

SketchSegNet+: An End-to-End Learning of RNN for Multi-Class Sketch Semantic Segmentation.
IEEE Access, 2019

Subjective Annotations for Vision-based Attention Level Estimation.
Proceedings of the 14th International Joint Conference on Computer Vision, 2019

Soft Dropout And Its Variational Bayes Approximation.
Proceedings of the 29th IEEE International Workshop on Machine Learning for Signal Processing, 2019

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On Training Targets and Objective Functions for Deep-learning-based Audio-visual Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2019

Effects of Lombard Reflex on the Performance of Deep-learning-based Audio-visual Speech Enhancement Systems.
Proceedings of the IEEE International Conference on Acoustics, 2019

Robust Bayesian and Maximum a Posteriori Beamforming for Hearing Assistive Devices.
Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing, 2019

2018
Wireless Personal Communications: Machine Learning for Big Data Processing in Mobile Internet.
Wirel. Pers. Commun., 2018

Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features.
IEEE Trans. Neural Networks Learn. Syst., 2018

Decorrelation of Neutral Vector Variables: Theory and Applications.
IEEE Trans. Neural Networks Learn. Syst., 2018

Using Closed-Set Speaker Identification Score Confidence to Enhance Audio-Based Collaborative Filtering for Multiple Users.
IEEE Trans. Consumer Electron., 2018

Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Bias-Compensated Informed Sound Source Localization Using Relative Transfer Functions.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Audio-Based Granularity-Adapted Emotion Classification.
IEEE Trans. Affect. Comput., 2018

A perceptually motivated LP residual estimator in noisy and reverberant environments.
Speech Commun., 2018

Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions.
Speech Commun., 2018

A spatial self-similarity based feature learning method for face recognition under varying poses.
Pattern Recognit. Lett., 2018

iSocioBot: A Multimodal Interactive Social Robot.
Int. J. Soc. Robotics, 2018

Recent advances in machine learning for non-Gaussian data processing.
Neurocomputing, 2018

Latent Dirichlet mixture model.
Neurocomputing, 2018

Incorporating pass-phrase dependent background models for text-dependent speaker verification.
Comput. Speech Lang., 2018

On the Equivalence between Objective Intelligibility and Mean-Squared Error for Deep Neural Network based Speech Enhancement.
CoRR, 2018

A Parallel/Distributed Algorithmic Framework for Mining All Quantitative Association Rules.
CoRR, 2018

A Dataset for Inferring Contextual Preferences of Users Watching TV.
Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization, 2018

The Sound or Silence: Investigating the Influence of Robot Noise on Proxemics.
Proceedings of the 27th IEEE International Symposium on Robot and Human Interactive Communication, 2018

Public perception of android robots: Indications from an analysis of YouTube comments.
Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018

Effectiveness of Single-Channel BLSTM Enhancement for Language Identification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Monaural Speech Enhancement Using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Frame Selection for Robust Speaker Identification: A Hybrid Approach.
Wirel. Pers. Commun., 2017

Visual Detection of Events of Interest from Urban Activity.
Wirel. Pers. Commun., 2017

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Informed Sound Source Localization Using Relative Transfer Functions for Hearing Aid Applications.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

DNN Filter Bank Cepstral Coefficients for Spoofing Detection.
CoRR, 2017

Time-Contrastive Learning Based Unsupervised DNN Feature Extraction for Speaker Verification.
CoRR, 2017

Multi-talker Speech Separation and Tracing with Permutation Invariant Training of Deep Recurrent Neural Networks.
CoRR, 2017

DNN Filter Bank Cepstral Coefficients for Spoofing Detection.
IEEE Access, 2017

Joint separation and denoising of noisy multi-talker speech using recurrent neural networks and permutation invariant training.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Adversarial Network Bottleneck Features for Noise Robust Speaker Verification.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017


On the Use of Band Importance Weighting in the Short-Time Objective Intelligibility Measure.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Weighted Score Based Fast Converging CO-training with Application to Audio-Visual Person Identification.
Proceedings of the 29th IEEE International Conference on Tools with Artificial Intelligence, 2017

Permutation invariant training of deep models for speaker-independent multi-talker speech separation.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A non-intrusive Short-Time Objective Intelligibility measure.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
Improved Gaussian Mixture Models for Adaptive Foreground Segmentation.
Wirel. Pers. Commun., 2016

Total Variability Modeling Using Source-Specific Priors.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

AMORE: design and implementation of a commercial-strength parallel hybrid movie recommendation engine.
Knowl. Inf. Syst., 2016

Using Theatre to Study Interaction with Care Robots.
Int. J. Soc. Robotics, 2016

Feature selection for neutral vector in EEG signal classification.
Neurocomputing, 2016

Text-Independent Speaker Identification Using the Histogram Transform Model.
IEEE Access, 2016

Effect of multi-condition training and speech enhancement methods on spoofing detection.
Proceedings of the First International Workshop on Sensing, 2016

Improving the convergence of co-training for audio-visual person identification.
Proceedings of the First International Workshop on Sensing, 2016

Background subtraction for patterns of activities in cities.
Proceedings of the First International Workshop on Sensing, 2016

Projecting emotional speech into arousal-valence space using pairwise preference learning.
Proceedings of the First International Workshop on Sensing, 2016

Speech enhancement using Long Short-Term Memory based recurrent Neural Networks for noise robust Speaker Verification.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Dirichlet mixture allocation.
Proceedings of the 26th IEEE International Workshop on Machine Learning for Signal Processing, 2016

Privacy protection performance of De-identified face images with and without background.
Proceedings of the 39th International Convention on Information and Communication Technology, 2016

Speaker-Dependent Dictionary-Based Speech Enhancement for Text-Dependent Speaker Verification.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Text Dependent Speaker Verification Using Un-Supervised HMM-UBM and Temporal GMM-UBM.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Integrated Spoofing Countermeasures and Automatic Speaker Verification: An Evaluation on ASVspoof 2015.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Utterance Verification for Text-Dependent Speaker Recognition: A Comparative Assessment Using the RedDots Corpus.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Adaptive overcurrent protection for microgrids in extensive distribution systems.
Proceedings of the IECON 2016, 2016

Informed Direction of Arrival estimation using a spherical-head model for Hearing Aid applications.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A method for predicting the intelligibility of noisy and non-linearly enhanced binaural speech.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Concurrent localization of sound sources and dual-microphone sub-arrays using TOFs.
Proceedings of the 19th International Conference on Information Fusion, 2016

2015
Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features-A Theoretically Consistent Approach.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Im2Sketch: Sketch generation by unconflicted perceptual grouping.
Neurocomputing, 2015

Binary pattern flavored feature extractors for Facial Expression Recognition: An overview.
Proceedings of the 38th International Convention on Information and Communication Technology, 2015

Assessing the Potential Use of Eye-Tracking Triangulation for Evaluating the Usability of an Online Diabetes Exercise System.
Proceedings of the MEDINFO 2015: eHealth-enabled Health, 2015

Neighbors Based Discriminative Feature Difference Learning for Kinship Verification.
Proceedings of the Advances in Visual Computing - 11th International Symposium, 2015

A heuristic approach for a social robot to navigate to a person based on audio and range information.
Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2015

Comparison of forced-alignment speech recognition and humans for generating reference VAD.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A binaural short time objective intelligibility measure for noisy and enhanced speech.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Local feature learning for face recognition under varying poses.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

A feature subtraction method for image based kinship verification under uncontrolled environments.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Source-specific informative prior for i-vector extraction.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

On the influence of microphone array geometry on HRTF-based Sound Source Localization.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Maximum likelihood approach to "informed" Sound Source Localization for Hearing Aid applications.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Informed TDoA-based direction of arrival estimation for hearing aid applications.
Proceedings of the 2015 IEEE Global Conference on Signal and Information Processing, 2015

A discriminative approach for speaker selection in speaker de-identification systems.
Proceedings of the 23rd European Signal Processing Conference, 2015

2014
Combination of Multiple Measurement Cues for Visual Face Tracking.
Wirel. Pers. Commun., 2014

Predictive Distribution of the Dirichlet Mixture Model by Local Variational Inference.
J. Signal Process. Syst., 2014

Using Audio-Derived Affective Offset to Enhance TV Recommendation.
IEEE Trans. Multim., 2014

Implementing a Commercial-Strength Parallel Hybrid Movie Recommendation Engine.
IEEE Intell. Syst., 2014

Joint variable frame rate and length analysis for speech recognition under adverse conditions.
Comput. Electr. Eng., 2014

Improving Robustness Against Environmental Sounds for Directing Attention of Social Robots.
Proceedings of the Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction, 2014

Utilising Tree-Based Ensemble Learning for Speaker Segmentation.
Proceedings of the Artificial Intelligence Applications and Innovations, 2014

Cluster-based adaptation using density forest for HMM phone recognition.
Proceedings of the 22nd European Signal Processing Conference, 2014

2013
Audio-based age and gender identification to enhance the recommendation of TV content.
IEEE Trans. Consumer Electron., 2013

A heuristic hierarchical scheme for academic search and retrieval.
Inf. Process. Manag., 2013

Multi-frame rate based multiple-model training for robust speaker identification of disguised voice.
Proceedings of the 16th International Symposium on Wireless Personal Multimedia Communications, 2013

Perceptual grouping via untangling Gestalt principles.
Proceedings of the 2013 Visual Communications and Image Processing, 2013

Fusing eye-gaze and speech recognition for tracking in an automatic reading tutor - a step in the right direction?
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2013

Demographic recommendation by means of group profile elicitation using speaker age and gender recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Developing a speaker identification system for the DARPA RATS project.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
A Joint Approach for Single-Channel Speaker Identification and Speech Separation.
IEEE Trans. Speech Audio Process., 2012

Guest Editors' Introduction to the Special Issue on "New Trends in Signal Processing and Biomedical Engineering".
Comput. Electr. Eng., 2012

EEG signal classification with super-Dirichlet mixture model.
Proceedings of the IEEE Statistical Signal Processing Workshop, 2012

PubSearch - A Hierarchical Heuristic Scheme for Ranking Academic Search Results.
Proceedings of the ICPRAM 2012, 2012

2011
Convex Combination of Multiple Statistical Models With Application to VAD.
IEEE Trans. Speech Audio Process., 2011

Technology-enabled social learning: a review.
Int. J. Knowl. Learn., 2011

Feature selection strategy for classification of single-trial EEG elicited by motor imagery.
Proceedings of the 14th International Symposium on Wireless Personal Multimedia Communications, 2011

Evaluating tracking accuracy of an automatic reading tutor.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2011

Combining acoustic and language model miscue detection methods for adult dyslexic read speech.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2011

Command & control: Information merging, selective visualization and decision support for emergency handling.
Proceedings of the 8th Proceedings of the International Conference on Information Systems for Crisis Response and Management, 2011

Mobile video annotation for enhanced rich media communication during emergency handling.
Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies, 2011

Multi-Sensor Voice Activity Detection Based on Multiple Observation Hypothesis Testing.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Sinusoidal Approach for the Single-Channel Speech Separation and Recognition Challenge.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010
Low-Complexity Variable Frame Rate Analysis for Speech Recognition and Voice Activity Detection.
IEEE J. Sel. Top. Signal Process., 2010

Introduction to the Issue on Speech Processing for Natural Interaction With Intelligent Environments.
IEEE J. Sel. Top. Signal Process., 2010

Improving monaural speaker identification by double-talk detection.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Signal-to-Signal Ratio Independent Speaker Identification for Co-channel Speech Signals.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Joint single-channel speech separation and speaker identification.
Proceedings of the IEEE International Conference on Acoustics, 2010

Crowd analysis by using optical flow and density based clustering.
Proceedings of the 18th European Signal Processing Conference, 2010

Three-dimensional adaptive sensing of people in a multi-camera setup.
Proceedings of the 18th European Signal Processing Conference, 2010

2009
Audio and Speech Processing for Data Mining.
Proceedings of the Encyclopedia of Data Warehousing and Mining, Second Edition (4 Volumes), 2009

High-accuracy, low-complexity voice activity detection based on a posteriori SNR weighted energy.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A system for detecting miscues in dyslexic read speech.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

2008
Robust Speech Recognition by Nonlocal Means Denoising Processing.
IEEE Signal Process. Lett., 2008

A posteriori SNR weighted energy based variable frame rate analysis for speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Speech Recognition on Mobile Devices.
Proceedings of the Mobile Multimedia Processing: Fundamentals, 2008

2007
Noise Condition-Dependent Training Based on Noise Classification and SNR Estimation.
IEEE Trans. Speech Audio Process., 2007

Exploiting Temporal Correlation of Speech for Error Robust and Bandwidth Flexible Distributed Speech Recognition.
IEEE Trans. Speech Audio Process., 2007

2006
Fuzzy Metagraph and Its Combination with the Indexing Approach in Rule-Based Systems.
IEEE Trans. Knowl. Data Eng., 2006

Robust speech recognition over mobile networks using combined weighted viterbi decoding and subvector based error concealment.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Robust Speech Recognition From Noise-Type Based Feature Compensation and Model Interpolation in a Multiple Model Framework.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Automatic speech recognition over error-prone wireless networks.
Speech Commun., 2005

Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End.
Proceedings of the IEEE 7th Workshop on Multimedia Signal Processing, 2005

Robust speech recognition based on noise and SNR classification - a multiple-model framework.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Robust speech recognition in ubiquitous networking and context-aware computing.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004
Spectral subtraction with full-wave rectification and likelihood controlled instantaneous noise estimation for robust speech recognition.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

On the integration of speech recognition into personal networks.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

A subvector-based error concealment algorithm for speech recognition over mobile networks.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
OOV-detection and channel error protection for distributed speech recognition over wireless networks.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Channel error protection scheme for distributed speech recognition.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002


  Loading...