Yu Tsao

Orcid: 0000-0001-6956-0418

Affiliations:
  • Academia Sinica, Research Center for Information Technology Innovation, Taipei, Taiwan


According to our database1, Yu Tsao authored at least 391 papers between 2001 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
ElectrodeNet - A Deep-Learning-Based Sound Coding Strategy for Cochlear Implants.
IEEE Trans. Cogn. Dev. Syst., February, 2024

An SRAM-Based Reconfigurable Cognitive Computation Matrix for Sensor Edge Applications.
IEEE J. Solid State Circuits, February, 2024

Unsupervised Face-Masked Speech Enhancement Using Generative Adversarial Networks With Human-in-the-Loop Assessment Metrics.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

MECG-E: Mamba-based ECG Enhancer for Baseline Wander Removal.
CoRR, 2024

MC-SEMamba: A Simple Multi-channel Extension of SEMamba.
CoRR, 2024

Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing.
CoRR, 2024

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement.
CoRR, 2024

A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models.
CoRR, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition.
CoRR, 2024

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset.
CoRR, 2024

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction.
CoRR, 2024

Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR.
CoRR, 2024

Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-based Speech Enhancement.
CoRR, 2024

EMO-Codec: An In-Depth Look at Emotion Preservation capacity of Legacy and Neural Codec Models With Subjective and Objective Evaluations.
CoRR, 2024

SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models.
CoRR, 2024

An Investigation of Incorporating Mamba for Speech Enhancement.
CoRR, 2024

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes.
CoRR, 2024

Towards Environmental Preference Based Speech Enhancement For Individualised Multi-Modal Hearing Aids.
CoRR, 2024

Audio-Visual Speech Enhancement in Noisy Environments via Emotion-Based Contextual Cues.
CoRR, 2024

A Non-Intrusive Neural Quality Assessment Model for Surface Electromyography Signals.
CoRR, 2024

HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids.
CoRR, 2024

Prognosticating Lumbar Spinal Surgery Outcomes for Low Back Pain and Sciatica Patients by Utilizing Preoperative Assessments from Western and Eastern Medicine and Multimodal Fusion Learning Techniques.
Proceedings of the 2024 8th International Conference on Medical and Health Informatics, 2024

A Study On Incorporating Whisper For Robust Speech Assessment.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice.
Proceedings of the IEEE International Conference on Consumer Electronics, 2024

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model.
Proceedings of the IEEE International Conference on Acoustics, 2024

Scalable Ensemble-Based Detection Method Against Adversarial Attacks For Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2024

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-Based ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024

SDEMG: Score-Based Diffusion Model for Surface Electromyographic Signal Denoising.
Proceedings of the IEEE International Conference on Acoustics, 2024

Bridging the Gap: Integrating Pre-Trained Speech Enhancement and Recognition Models for Robust Speech Recognition.
Proceedings of the 32nd European Signal Processing Conference, 2024

The Multilayer Neural Network Implementation Using SRAM-Based Reconfigurable Cognitive Computation Matrices.
Proceedings of the 6th IEEE International Conference on AI Circuits and Systems, 2024

2023
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing.
J. Open Source Softw., November, 2023

Toward Real-World Voice Disorder Classification.
IEEE Trans. Biomed. Eng., October, 2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310).
Dataset, October, 2023

Cyclostationary Impulse Noise Dataset.
Dataset, October, 2023

SRECG: ECG Signal Super-Resolution Framework for Portable/Wearable Devices in Cardiac Arrhythmias Classification.
IEEE Trans. Consumer Electron., August, 2023

Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Multi-Target Extractor and Detector for Unknown-Number Speaker Diarization.
IEEE Signal Process. Lett., 2023

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model.
CoRR, 2023

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection.
CoRR, 2023

Neural domain alignment for spoken language recognition based on optimal transport.
CoRR, 2023

AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection.
CoRR, 2023

Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement.
CoRR, 2023

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
CoRR, 2023

Utilizing Whisper to Enhance Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids.
CoRR, 2023

Noise robust speech emotion recognition with signal-to-noise ratio adapting speech enhancement.
CoRR, 2023

Deep denoising autoencoder-based non-invasive blood flow detection for arteriovenous fistula.
CoRR, 2023

Preoperative Prognosis Assessment of Lumbar Spinal Surgery for Low Back Pain and Sciatica Patients based on Multimodalities and Multimodal Learning.
CoRR, 2023

Self-supervised based general laboratory progress pretrained model for cardiovascular event detection.
CoRR, 2023

BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm.
CoRR, 2023

Advances in biomedical signal processing for communication disorders.
Biomed. Signal Process. Control., 2023

Wearable-based Pain Assessment in Patients with Adhesive Capsulitis Using Machine Learning.
Proceedings of the 11th International IEEE/EMBS Conference on Neural Engineering, 2023

IANS: Intelligibility-Aware Null-Steering Beamforming for Dual-Microphone Arrays.
Proceedings of the 33rd IEEE International Workshop on Machine Learning for Signal Processing, 2023

Inference and Denoise: Causal Inference-Based Neural Speech Enhancement.
Proceedings of the 33rd IEEE International Workshop on Machine Learning for Signal Processing, 2023

Voice Direction-Of-Arrival Conversion.
Proceedings of the 33rd IEEE International Workshop on Machine Learning for Signal Processing, 2023

Deep Learning-based Fall Detection Algorithm Using Ensemble Model of Coarse-fine CNN and GRU Networks.
Proceedings of the IEEE International Symposium on Medical Measurements and Applications, 2023

Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Interpretations of Domain Adaptations via Layer Variational Analysis.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

D4AM: A General Denoising Framework for Downstream Acoustic Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

ECG Artifact Removal from Single-Channel Surface EMG Using Fully Convolutional Networks.
Proceedings of the IEEE International Conference on Acoustics, 2023

On the Robustness of Non-Intrusive Speech Quality Model by Adversarial Examples.
Proceedings of the IEEE International Conference on Acoustics, 2023

Towards Individualised Speech Enhancement: An SNR Preference Learning System for Multi-Modal Hearing Aids.
Proceedings of the IEEE International Conference on Acoustics, 2023

T5lephone: Bridging Speech and Text Self-Supervised Models for Spoken Language Understanding Via Phoneme Level T5.
Proceedings of the IEEE International Conference on Acoustics, 2023

Prefallkd: Pre-Impact Fall Detection Via CNN-ViT Knowledge Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Task Learning U-Net for Functional Shoulder Sub-Task Segmentation.
Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

Abnormal Respiratory Sound Identification Using Audio-Spectrogram Vision Transformer.
Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

Cross-Modal Alignment With Optimal Transport For CTC-Based ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Study on the Correlation Between Objective Evaluations and Subjective Speech Quality and Intelligibility.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Improved Lite Audio-Visual Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Deep-Learning-Based Signal Enhancement of Low-Resolution Accelerometer for Fall Detection Systems.
IEEE Trans. Cogn. Dev. Syst., 2022

A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal Enhancement.
IEEE Trans. Artif. Intell., 2022

SVSNet: An End-to-End Speaker Voice Similarity Assessment Model.
IEEE Signal Process. Lett., 2022

EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning.
IEEE Signal Process. Lett., 2022

Neural correlates of individual differences in predicting ambiguous sounds comprehension level.
NeuroImage, 2022

Audio-Visual Speech Enhancement and Separation by Leveraging Multi-Modal Self-Supervised Embeddings.
CoRR, 2022

CasNet: Investigating Channel Robustness for Speech Separation.
CoRR, 2022

A Teacher-student Framework for Unsupervised Speech Enhancement Using Noise Remixing Training and Two-stage Inference.
CoRR, 2022

Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN.
CoRR, 2022

EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning.
CoRR, 2022

A Study of Using Cepstrogram for Countermeasure Against Replay Attacks.
CoRR, 2022

Filter-based Discriminative Autoencoders for Children Speech Recognition.
CoRR, 2022

Partial Coupling of Optimal Transport for Spoken Language Identification.
CoRR, 2022

Multi-Target Filter and Detector for Speaker Diarization.
CoRR, 2022

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition.
CoRR, 2022

Continuous Speech for Improved Learning Pathological Voice Disorders.
CoRR, 2022

A Novel Speech Intelligibility Enhancement Model based on CanonicalCorrelation and Deep Learning.
CoRR, 2022

CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application.
IEEE Access, 2022

Chinese Movie Dialogue Question Answering Dataset.
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing, 2022

Preservation Of Interaural Level Difference Cue In A Deep Learning-Based Speech Separation System For Bilateral And Bimodal Cochlear Implants Users.
Proceedings of the 17th International Workshop on Acoustic Signal Enhancement, 2022

Speech Enhancement Based on CycleGAN with Noise-informed Training.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

OSSEM: one-shot speaker adaptive speech enhancement using meta learning.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Perceptual Characteristics Based Multi-objective Model for Speech Enhancement.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Boosting Self-Supervised Embeddings for Speech Enhancement.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

The VoiceMOS Challenge 2022.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

InQSS: a speech intelligibility and quality assessment model using a multi-task learning network.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Perceptual Contrast Stretching on Target Feature for Speech Enhancement.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing.
Proceedings of the IEEE International Conference on Acoustics, 2022

Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.
Proceedings of the IEEE International Conference on Acoustics, 2022

EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

Conditional Diffusion Probabilistic Model for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

Analyzing The Robustness of Unsupervised Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Speech Recovery For Real-World Self-Powered Intermittent Devices.
Proceedings of the IEEE International Conference on Acoustics, 2022

MetricGAN-U: Unsupervised Speech Enhancement/ Dereverberation Based Only on Noisy/ Reverberated Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

Key Generation with Ambient Audio.
Proceedings of the IEEE Global Communications Conference, 2022

Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation.
Proceedings of the 30th European Signal Processing Conference, 2022

Dysarthric Speech Enhancement Based on Convolution Neural Network.
Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

A Novel Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning.
Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2022

2021
Dress With Style: Learning Style From Joint Deep Embedding of Clothing Styles and Body Shapes.
IEEE Trans. Multim., 2021

Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

A Study of Joint Effect on Denoising Techniques and Visual Cues to Improve Speech Intelligibility in Cochlear Implant Simulation.
IEEE Trans. Cogn. Dev. Syst., 2021

Multimodal Deep Learning Framework for Image Popularity Prediction on Social Media.
IEEE Trans. Cogn. Dev. Syst., 2021

Sensing ecosystem dynamics via audio source separation: A case study of marine soundscapes off northeastern Taiwan.
PLoS Comput. Biol., 2021

Predicting the Travel Distance of Patients to Access Healthcare using Deep Neural Networks.
CoRR, 2021

Toward Real-World Pathological Voice Detection.
CoRR, 2021

InQSS: a speech intelligibility assessment model using a multi-task learning network.
CoRR, 2021

Speech Enhancement-assisted Stargan Voice Conversion in Noisy Environments.
CoRR, 2021

A Study of Low-Resource Speech Commands Recognition based on Adversarial Reprogramming.
CoRR, 2021

Intermittent Speech Recovery.
CoRR, 2021

The AS-NU System for the M2VoC Challenge.
CoRR, 2021

Integrating a joint Bayesian generative model in a discriminative learning framework for speaker verification.
CoRR, 2021

A Flexible and Extensible Framework for Multiple Answer Modes Question Answering.
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing, 2021

Investigation of a Single-Channel Frequency-Domain Speech Enhancement Network to Improve End-to-End Bengali Automatic Speech Recognition Under Unseen Noisy Conditions.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Deep Learning and Explainable Artificial Intelligence to Predict Patients' Choice of Hospital Levels in Urban and Rural Areas.
Proceedings of the MEDINFO 2021: One World, One Health - Global Partnership for Digital Innovation, 2021

MoEVC: A Mixture of Experts Voice Conversion System With Sparse Gating Mechanism for Online Computation Acceleration.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Attention-Based Multi-Task Learning for Speech-Enhancement and Speaker-Identification in Multi-Speaker Dialogue Scenario.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

Relational Data Selection for Data Augmentation of Speaker-Dependent Multi-Band MelGAN Vocoder.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

QISTA-Net-Audio: Audio Super-Resolution via Non-Convex ℓ_q-Norm Minimization.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Perceptual Quality by Phone-Fortified Perceptual Loss Using Wasserstein Distance for Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

One Shot Learning for Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Unsupervised Neural Adaptation Model Based on Optimal Transport for Spoken Language Identification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Enhancement with Zero-Shot Model Selection.
Proceedings of the 29th European Signal Processing Conference, 2021

A Study of Incorporating Articulatory Movement Information in Speech Enhancement.
Proceedings of the 29th European Signal Processing Conference, 2021

Instrumented shoulder functional assessment using inertial measurement units for frozen shoulder.
Proceedings of the IEEE EMBS International Conference on Biomedical and Health Informatics, 2021

Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

HASA-Net: A Non-Intrusive Hearing-Aid Speech Assessment Network.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

A Study on Speech Enhancement Based on Diffusion Probabilistic Model.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

MIMO Speech Compression and Enhancement Based on Convolutional Denoising Autoencoder.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Estimation and Correction of Relative Transfer Function for Binaural Speech Separation Networks to Preserve Spatial Cues.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Blind Monaural Source Separation on Heart and Lung Sounds Based on Periodic-Coded Deep Autoencoder.
IEEE J. Biomed. Health Informatics, 2020

Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion.
IEEE Trans. Emerg. Top. Comput. Intell., 2020

Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Ensemble Hierarchical Extreme Learning Machine for Speech Dereverberation.
IEEE Trans. Cogn. Dev. Syst., 2020

Time-Domain Multi-Modal Bone/Air Conducted Speech Enhancement.
IEEE Signal Process. Lett., 2020

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End Speech Enhancement.
IEEE Signal Process. Lett., 2020

Learning With Learned Loss Function: Speech Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality.
IEEE Signal Process. Lett., 2020

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech.
Comput. Speech Lang., 2020

Domain-adaptive Fall Detection Using Deep Adversarial Training.
CoRR, 2020

ECG Signal Super-resolution by Considering Reconstruction and Cardiac Arrhythmias Classification Loss.
CoRR, 2020

Speech enhancement guided by contextual articulatory information.
CoRR, 2020

Improving Perceptual Quality by Phone-Fortified Perceptual Loss for Speech Enhancement.
CoRR, 2020

CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application.
CoRR, 2020

Using Deep Learning and Explainable Artificial Intelligence in Patients' Choices of Hospital Levels.
CoRR, 2020

Boosting Objective Scores of Speech Enhancement Model through MetricGAN Post-Processing.
CoRR, 2020

SADDEL: Joint Speech Separation and Denoising Model based on Multitask Learning.
CoRR, 2020

Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders.
CoRR, 2020

The IPIN 2019 Indoor Localisation Competition - Description and Results.
IEEE Access, 2020

Incorporating Broad Phonetic Information for Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise Using Generative Adversarial Network-Based Metric Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

SERIL: Noise Adaptive Speech Enhancement Using Regularization-Based Incremental Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Lite Audio-Visual Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Enhancing Intelligibility of Dysarthric Speech Using Gated Convolutional-Based Voice Conversion System.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Space-Time Guided Association Learning For Unsupervised Person Re-Identification.
Proceedings of the IEEE International Conference on Image Processing, 2020

Exponentiated magnitude spectrogram-based relative-to-maximum masking for speech enhancement in adverse environments.
Proceedings of the IEEE International Conference on Consumer Electronics - Taiwan, 2020

Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Cross-Technology Interference Mitigation Using Fully Convolutional Denoising Autoencoders.
Proceedings of the IEEE Global Communications Conference, 2020

The Academia Sinica Systems of Voice Conversion for VCC2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Computation-Performance Optimization of Convolutional Neural Networks With Redundant Filter Removal.
IEEE Trans. Circuits Syst. I Regul. Pap., 2019

Toward Automating Oral Presentation Scoring During Principal Certification Program Using Audio-Video Low-Level Behavior Profiles.
IEEE Trans. Affect. Comput., 2019

Increasing Compactness of Deep Learning Based Speech Enhancement Models With Parameter Pruning and Quantization Techniques.
IEEE Signal Process. Lett., 2019

Deep progressive multi-scale attention for acoustic event classification.
CoRR, 2019

MoEVC: A Mixture-of-experts Voice Conversion System with Sparse Gating Mechanism for Accelerating Online Computation.
CoRR, 2019

MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing.
CoRR, 2019

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement.
CoRR, 2019

Distributed Microphone Speech Enhancement based on Deep Learning.
CoRR, 2019

The ASVspoof 2019 database.
CoRR, 2019

Seeing Voices in Noise: A Study of Audiovisual-Enhanced Vocoded Speech Intelligibility in Cochlear Implant Simulation.
CoRR, 2019

Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement.
CoRR, 2019

Multichannel Speech Enhancement by Raw Waveform-mapping using Fully Convolutional Networks.
CoRR, 2019

Robust S1 and S2 heart sound recognition based on spectral restoration and multi-style training.
Biomed. Signal Process. Control., 2019

Evaluating Indoor Positioning Systems in a Shopping Mall: The Lessons Learned From the IPIN 2018 Competition.
IEEE Access, 2019

Noise Reduction in ECG Signals Using Fully Convolutional Denoising Autoencoders.
IEEE Access, 2019

Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Speech enhancement based on the integration of fully convolutional network, temporal lowpass filtering and spectrogram masking.
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing, 2019

Garment Detectives: Discovering Clothes and Its Genre in Consumer Photos.
Proceedings of the 2nd IEEE Conference on Multimedia Information Processing and Retrieval, 2019

Bone-Conducted Speech Enhancement Using Hierarchical Extreme Learning Machine.
Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019

Comparative Study of Masking and Mapping Based on Hierarchical Extreme Learning Machine for Speech Enhancement.
Proceedings of the 2019 International Symposium on Intelligent Signal Processing and Communication Systems, 2019

Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

IA-NET: Acceleration and Compression of Speech Enhancement Using Integer-Adder Deep Neural Network.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Noise Adaptive Speech Enhancement Using Domain Adversarial Training.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Incorporating Symbolic Sequential Modeling for Speech Enhancement.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Exploring the Encoder Layers of Discriminative Autoencoders for LVCSR.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement.
Proceedings of the 36th International Conference on Machine Learning, 2019

Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Audio-Visual Speech Enhancement using Hierarchical Extreme Learning Machine.
Proceedings of the 27th European Signal Processing Conference, 2019

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion.
Proceedings of the 27th European Signal Processing Conference, 2019

Subjective Feedback-based Neural Network Pruning for Speech Enhancement.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Investigation of Neural Network Approaches for Unified Spectral and Prosodic Feature Enhancement.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Compressed Multimodal Hierarchical Extreme Learning Machine for Speech Enhancement.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

A Pruned-CELP Speech Codec Using Denoising Autoencoder with Spectral Compensation for Quality and Intelligibility Enhancement.
Proceedings of the IEEE International Conference on Artificial Intelligence Circuits and Systems, 2019

2018
Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks.
IEEE Trans. Emerg. Top. Comput. Intell., 2018

Suppression by Selecting Wavelets for Feature Compression in Distributed Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Bone-conducted speech enhancement using deep denoising autoencoder.
Speech Commun., 2018

SmartHear: A Smartphone-Based Remote Microphone Hearing Assistive System Using Wireless Technologies.
IEEE Syst. J., 2018

Off-Line Evaluation of Mobile-Centric Indoor Positioning Systems: The Experiences from the 2017 IPIN Competition.
Sensors, 2018

Locally Linear Embedding Based Post-Filtering for Speech Enhancement.
J. Inf. Sci. Eng., 2018

Voice Conversion Based on Locally Linear Embedding.
J. Inf. Sci. Eng., 2018

Robustness against the channel effect in pathological voice detection.
CoRR, 2018

Speech Enhancement Based on Reducing the Detail Portion of Speech Spectrograms in Modulation Domain via Discrete Wavelet Transform.
CoRR, 2018

Speech Dereverberation Based on Integrated Deep and Ensemble Learning.
CoRR, 2018

Adaptive Noise Cancellation Using Deep Cerebellar Model Articulation Controller.
IEEE Access, 2018

A Study on Speech Enhancement Using Exponent-Only Floating Point Quantized Neural Network (EOFP-QNN).
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Architecture Design of Convolutional Neural Networks for Face Detection on an FPGA Platform.
Proceedings of the 2018 IEEE International Workshop on Signal Processing Systems, 2018

WaveNet 聲碼器及其於語音轉換之應用 (WaveNet Vocoder and its Applications in Voice Conversion) [In Chinese].
Proceedings of the 30th Conference on Computational Linguistics and Speech Processing, 2018

Automatic Detection of Speech Under Cold Using Discriminative Autoencoders and Strength Modeling with Multiple Sub-Dictionary Generation.
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

IOS-based Ear Scale application for Clinical Audiology and Otology Usage.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Speech Enhancement Based on Reducing the Detail Portion of Speech Spectrograms in Modulation Domain via DiscreteWavelet Transform.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Hearing aids APP design based on deep learning technology.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Exemplar-Based Spectral Detail Compensation for Voice Conversion.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Temporal Attentive Pooling for Acoustic Event Detection.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

An Industrial IoT Analysis System Based on Machining Data of Metal Materials.
Proceedings of the International Conference on Fuzzy Theory and Its Applications, 2018

A Novel LSTM-Based Speech Preprocessor for Speaker Diarization in Realistic Mismatch Conditions.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Enhancement and Analysis of Conversational Speech: JSALT 2017.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Congruent Visual Stimulation Facilitates Auditory Frequency Change Detection: An ERP Study.
Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2018

Improving the performance of hearing aids in noisy environments based on deep learning technology.
Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2018

Deep Denoising Autoencoder Based Post Filtering for Speech Enhancement.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
A Deep Denoising Autoencoder Approach to Improving the Intelligibility of Vocoded Speech in Cochlear Implant Simulation.
IEEE Trans. Biomed. Eng., 2017

Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery.
IEEE Trans. Biomed. Eng., 2017

S1 and S2 Heart Sound Recognition Using Deep Neural Networks.
IEEE Trans. Biomed. Eng., 2017

Personalizing Recurrent-Neural-Network-Based Language Model by Social Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A Replay Spoofing Detection System Based on Discriminative Autoencoders.
Int. J. Comput. Linguistics Chin. Lang. Process., 2017

Acoustic Echo Cancellation Using an Improved Vector-Space-Based Adaptive Filtering Algorithm.
Int. J. Comput. Linguistics Chin. Lang. Process., 2017

Regularization of neural network model with distance metric learning for i-vector based spoken language identification.
Comput. Speech Lang., 2017

Multi-style learning with denoising autoencoders for acoustic modeling in the internet of things (IoT).
Comput. Speech Lang., 2017

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks.
CoRR, 2017

Adaptive Noise Cancellation Using Deep Cerebellar Model Articulation Controller.
CoRR, 2017

Audio-Visual Speech Enhancement based on Multimodal Deep Convolutional Neural Network.
CoRR, 2017

Multi-Metrics Learning for Speech Enhancement.
CoRR, 2017

Experimental Study on Extreme Learning Machine Applications for Speech Enhancement.
IEEE Access, 2017

A Smartphone-Based Multi-Functional Hearing Assistive System to Facilitate Speech Recognition in the Classroom.
IEEE Access, 2017

以軟體為基礎建構語音增強系統使用者介面 (Development of a software-based User-Interface of Speech Enhancement System) [In Chinese].
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, 2017

以語音能量特性發展即時語速偵測裝置-前導型研究 (Real-time monitoring device of phonation speed and volume based on speech energy: A pilot study) [In Chinese].
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, 2017

基於鑑別式自編碼解碼器之錄音回放攻擊偵測系統 (A Replay Spoofing Detection System Based on Discriminative Autoencoders) [In Chinese].
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, 2017

改進的向量空間可適性濾波器用於聲學回聲消除 (Acoustic Echo Cancellation Using an Improved Vector-Space-Based Adaptive Filtering Algorithm) [In Chinese].
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, 2017

多樣訊雜比之訓練語料於降噪自動編碼器其語音強化功能之初步研究 (A Preliminary Study of Various SNR-level Training Data in the Denoising Auto-encoder (DAE) Technique for Speech Enhancement) [In Chinese].
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, 2017

Complex spectrogram enhancement by convolutional neural network with multi-metrics learning.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Object-based on-line video summarization for internet of video things.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Discriminative Autoencoders for Acoustic Modeling.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A Post-Filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech Enhancement.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Wavelet Speech Enhancement Based on Robust Principal Component Analysis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A locally linear embbeding based postfiltering approach for speech enhancement.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Discriminative autoencoders for speaker verification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Track-Clustering Error Evaluation for Track-Based Multi-camera Tracking System Employing Human Re-identification.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

A deep learning based noise reduction approach to improve speech intelligibility for cochlear implant recipients in the presence of competing speech noise.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Fast locally linear embedding algorithm for exemplar-based voice conversion.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Raw waveform-based speech enhancement by fully convolutional networks.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Acoustic echo cancellation using deep cerebellar model articulation controller.
Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

2016
Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization.
IEEE Signal Process. Lett., 2016

Generalized maximum a posteriori spectral amplitude estimation for speech enhancement.
Speech Commun., 2016

Modeling speech intelligibility with recovered envelope from temporal fine structure stimulus.
Speech Commun., 2016

Transportation Modes Classification Using Sensors on Smartphones.
Sensors, 2016

Maximum Entropy Learning with Deep Belief Networks.
Entropy, 2016

Robust Beamforming Against DoA Mismatch Using Subspace-Constrained Diagonal Loading.
CoRR, 2016

Image Retrieval Using Color-Aware Tag on Progressive Image Search and Recommendation System.
Proceedings of the MultiMedia Modeling - 22nd International Conference, 2016

A pseudo-task design in multi-task learning deep neural network for speaker recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Improving the performance of speech perception in noisy environment based on an FAME strategy.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Incorporating local environment information with ensemble neural networks to robust automatic speech recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Dictionary update for NMF-based voice conversion using an encoder-decoder network.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Locally Linear Embedding for Exemplar-Based Spectral Conversion.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language Identification.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Minimization of Regression and Ranking Losses with Shallow Neural Networks on Automatic Sincerity Evaluation.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Speech enhancement via ensemble modeling NMF adaptation.
Proceedings of the IEEE International Conference on Consumer Electronics-Taiwan, 2016

Leveraging nonnegative matrix factorization in processing the temporal modulation spectrum for speech enhancement.
Proceedings of the IEEE International Conference on Consumer Electronics-Taiwan, 2016

Nonnegative matrix factorization-based frequency lowering technology for Mandarin-speaking hearing aid users.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A study of mobile advertisement recommendation using real big data from AdLocus.
Proceedings of the IEEE 5th Global Conference on Consumer Electronics, 2016

A linear regression model with dynamic pulse transit time features for noninvasive blood pressure prediction.
Proceedings of the IEEE Biomedical Circuits and Systems Conference, 2016

Temporal Modulation Spectral Restoration for Robust Speech Recognition.
Proceedings of the IEEE Second International Conference on Multimedia Big Data, 2016

Adaptive subspace-constrained diagonal loading.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Voice conversion from non-parallel corpora using variational auto-encoder.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Audio-visual speech enhancement using deep neural networks.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Compensating for Orientation Mismatch in Robust Wi-Fi Localization Using Histogram Equalization.
IEEE Trans. Veh. Technol., 2015

Acoustic Echo Cancellation Using a Vector-Space-Based Adaptive Filtering Algorithm.
IEEE Signal Process. Lett., 2015

Ensemble environment modeling using affine transform group.
Speech Commun., 2015

Rapid Converging M-Max Partial Update Least Mean Square Algorithms with New Variable Step-Size Methods.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2015

Robust Voice Activity Detection Algorithm Based on Feature of Frequency Modulation of Harmonics and Its DSP Implementation.
IEICE Trans. Inf. Syst., 2015

類神經網路訓練結合環境群集及專家混合系統於強健性語音辨識(Automatic Speech Recognition using Neural Network based Acoustic Model with the Environment Clustering and Mixture of Experts Algorithms) [In Chinese].
Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, 2015

Sparse representation with temporal max-smoothing for acoustic event detection.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Speech recognition with temporal neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A deep neural network based approach to mandarin consonant/vowel separation.
Proceedings of the IEEE International Conference on Consumer Electronics - Taiwan, 2015

Temporal information in tone recognition.
Proceedings of the IEEE International Conference on Consumer Electronics - Taiwan, 2015

A discriminative post-filter for speech enhancement in hearing aids.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Multimodal arousal rating using unsupervised fusion technique.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A new frequency lowering technique for Mandarin-speaking hearing aid users.
Proceedings of the 2015 IEEE Global Conference on Signal and Information Processing, 2015

Temporal alignment for deep neural networks.
Proceedings of the 2015 IEEE Global Conference on Signal and Information Processing, 2015

Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

A probabilistic interpretation for artificial neural network-based voice conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014
A MAP-based Online Estimation Approach to Ensemble Speaker and Speaking Environment Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Variable Selection Linear Regression for Robust Speech Recognition.
IEICE Trans. Inf. Syst., 2014

Incorporating local information of the acoustic environments to MAP-based feature compensation and acoustic model adaptation.
Comput. Speech Lang., 2014

Effect of adaptive envelope compression in simulated electric hearing in reverberation.
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014

Acoustic feature conversion using a polynomial based feature transferring algorithm.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Spectral patch based sparse coding for acoustic event detection.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Ensemble modeling of denoising autoencoder for speech spectrum restoration.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Automatic speech recognition with primarily temporal envelope information.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Clustering-based i-vector formulation for speaker recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

An adaptive envelope compression strategy for speech processing in cochlear implants.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Ensemble of machine learning algorithms for cognitive and physical speaker load detection.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A Transfer Probabilistic Collective Factorization Model to Handle Sparse Data in Collaborative Filtering.
Proceedings of the 2014 IEEE International Conference on Data Mining, 2014

Sparse representation based on a bag of spectral exemplars for acoustic event detection.
Proceedings of the IEEE International Conference on Acoustics, 2014

Speech enhancement using segmental nonnegative matrix factorization.
Proceedings of the IEEE International Conference on Acoustics, 2014

Robust anchorperson detection based on audio streams using a hybrid I-vector and DNN system.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
結合I-Vector 及深層神經網路之語者驗證系統 (Text-independent Speaker Verification using a Hybrid I-Vector/DNN Approach) [In Chinese].
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing, 2013

Evaluation of generalized maximum a posteriori spectral amplitude (GMAPA) speech enhancement algorithm in hearing aids.
Proceedings of the IEEE International Symposium on Consumer Electronics, 2013

Recurrent neural network based language model personalization by social network crowdsourcing.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Speech enhancement based on deep denoising autoencoder.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Ensemble of machine learning and acoustic segment model techniques for speech emotion and autism spectrum disorders recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Sparse maximum entropy deep belief nets.
Proceedings of the 2013 International Joint Conference on Neural Networks, 2013

Semantic Naïve Bayes Classifier for Document Classification.
Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013

Filtering on the temporal probability sequence in histogram equalization for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Speech enhancement using generalized maximum a posteriori spectral amplitude estimator.
Proceedings of the IEEE International Conference on Acoustics, 2013

Robust Wi-Fi location fingerprinting against device diversity based on spatial mean normalization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Incorporating global variance in the training phase of GMM-based voice conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012
A study on cepstral sub-band normalization for robust ASR.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Acoustic space partition based on broad phonetic class for ensemble acoustic modeling.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Exploring mutual information for GMM-based spectral conversion.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A Study of Mutual Information for GMM-Based Spectral Conversion.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Discriminative Fuzzy Clustering Maximum a Posterior Linear Regression for Speaker Adaptation.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

A linear projection approach to environment modeling for robust speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Incorporating Regional Information to Enhance MAP-Based Stochastic Feature Compensation for Robust Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A sampling-based environment population projection approach for rapid acoustic model adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2011

Increasing discriminative capability on MAP-based mapping function estimation for acoustic model adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
An environment structuring framework to facilitating suitable prior density estimation for MAPLR on robust speech recognition.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

A particle filter feature compensation approach to robust speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Shrinkage model adaptation in automatic speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

An acoustic segment model approach to incorporating temporal information into speaker modeling for text-independent speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition.
IEEE Trans. Speech Audio Process., 2009

Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling.
Proceedings of the 3rd International Universal Communication Symposium, 2009

A study on soft margin estimation of linear regression parameters for speaker adaptation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Ensemble speaker and speaking environment modeling approach with advanced online estimation process.
Proceedings of the IEEE International Conference on Acoustics, 2009

MAP estimation of online mapping parameters in ensemble speaker and speaking environment modeling.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
An ensemble speaker and speaking environment modeling approach to robust speech recognition.
PhD thesis, 2008

Improving the ensemble speaker and speaking environment modeling approach by enhancing the precision of the online estimation process.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

A programmable analog radial-basis-function based classifier.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
An ensemble modeling approach to joint characterization of speaker and speaking environments.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Detection-based ASR in the automatic speech attribute transcription project.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
A vector space approach to environment modeling for robust speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

A study on detection based automatic speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005
Segmental eigenvoice with delicate eigenspace for improved speaker adaptation.
IEEE Trans. Speech Audio Process., 2005

A study on separation between acoustic models and its applications.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

A Study on Knowledge Source Integration for Candidate Rescoring in Automatic Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2001
Segmental eigenvoice for rapid speaker adaptation.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001


  Loading...