Hoirin Kim

Orcid: 0000-0002-8787-6982

According to our database1, Hoirin Kim authored at least 84 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition.
CoRR, 2024

One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection.
CoRR, 2024

STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

AdaMS: Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination.
CoRR, 2022

FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning.
CoRR, 2022

Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech.
IEEE Access, 2022

ACNN-VC: Utilizing Adaptive Convolution Neural Network for One-Shot Voice Conversion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Models.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Supervised Attention for Speaker Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning with Spoofing Detection and Spoofing Type Classification.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Single-Variable-Input Active Sidelobe Suppression Method for Synthesized Magnetic Field Focusing Technology and Its Optimization.
IEEE Trans. Ind. Electron., 2020

Cross-Informed Domain Adversarial Training for Noise-Robust Wake-Up Word Detection.
IEEE Signal Process. Lett., 2020

Interlayer Selective Attention Network for Robust Personalized Wake-Up Word Detection.
IEEE Signal Process. Lett., 2020

Perceptually Guided End-to-End Text-to-Speech.
CoRR, 2020

Neural voice cloning with a few low-quality samples.
CoRR, 2020

Pitchtron: Towards audiobook generation from ordinary people's voices.
CoRR, 2020

Multi-Scale Aggregation Using Feature Pyramid Module for Text-Independent Speaker Verification.
CoRR, 2020

Transductive Few-shot Learning with Meta-Learned Confidence.
CoRR, 2020

A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments.
IEEE Access, 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Noise Embedding: Noise Aware Training and Adaptation for Speech Enhancement.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2-D Synthesized Magnetic Field Focusing Technology With Loop Coils Distributed in a Rectangular Formation.
IEEE Trans. Ind. Electron., 2019

Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Additional Shared Decoder on Siamese Multi-View Encoders for Learning Acoustic Word Embeddings.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Self-Adaptive Soft Voice Activity Detection Using Deep Neural Networks for Robust Speaker Verification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Learning Self-Informed Feature Contribution for Deep Learning-Based Acoustic Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Learning acoustic word embeddings with phonetically associated triplet network.
CoRR, 2018

Joint Learning Using Denoising Variational Autoencoders for Voice Activity Detection.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Development of distant multi-channel speech and noise databases for speech recognition by in-door conversational robots.
Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment, 2017

Linear-scale filterbank for deep neural network-based voice activity detection.
Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment, 2017

Deep Least Squares Regression for Speaker Adaptation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

CNN-based bottleneck feature for noise robust query-by-example spoken term detection.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Speaker Normalization Through Feature Shifting of Linearly Transformed i-Vector.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Cross-acoustic transfer learning for sound event classification.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Automatic Intelligibility Assessment of Dysarthric Speech Using Phonologically-Structured Sparse Linear Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Probabilistic Class Histogram Equalization Based on Posterior Mean Estimation for Robust Speech Recognition.
IEEE Signal Process. Lett., 2015

Scaled norm-based Euclidean projection for sparse speaker adaptation.
EURASIP J. Adv. Signal Process., 2015

Robust sound event classification using LBP-HOG based bag-of-audio-words feature representation.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Speech emotion classification using tree-structured sparse logistic regression.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Discriminative likelihood score weighting based on acoustic-phonetic classification for speaker identification.
EURASIP J. Adv. Signal Process., 2014

Constrained MLE-based speaker adaptation with L1 regularization.
Proceedings of the IEEE International Conference on Acoustics, 2014

Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

ROBUST detection of infant crying in adverse environments using weighted segmental two-dimensional linear frequency cepstral coefficients.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2013

Audio-Based Objectionable Content Detection Using Discriminative Transforms of Time-Frequency Dynamics.
IEEE Trans. Multim., 2012

Multiple Acoustic Model-Based Discriminative Likelihood Ratio Weighting for Voice Activity Detection.
IEEE Signal Process. Lett., 2012

Combination of Multiple Speech Dimensions for Automatic Assessment of Dysarthric Speech Intelligibility.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Automatic Assessment of Dysarthric Speech Intelligibility Based on Selected Phonetic Quality Features.
Proceedings of the Computers Helping People with Special Needs, 2012

Reliable likelihood ratios for statistical model-based voice activity detector with low false-alarm rate.
EURASIP J. Adv. Signal Process., 2011

Automatic extraction of pornographic contents using radon transform based audio features.
Proceedings of the 9th International Workshop on Content-Based Multimedia Indexing, 2011

Robust speaker recognition based on filtering in autocorrelation domain and sub-band feature recombination.
Pattern Recognit. Lett., 2010

Acoustic Model Combination Incorporated With Mask-Based Multi-Channel Source Separation for Automatic Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2010

Cepstral Domain Feature Extraction Utilizing Entropic Distance-Based Filterbank.
IEICE Trans. Inf. Syst., 2010

Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection.
IEICE Trans. Inf. Syst., 2010

Histogram Equalization to Model Adaptation for Robust Speech Recognition.
EURASIP J. Adv. Signal Process., 2010

Automatic detection of malicious sound using segmental two-dimensional mel-frequency cepstral coefficients and histograms of oriented gradients.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

A robust target signal detector based on statistical models using binaural cross-similarity information.
Proceedings of the 18th European Signal Processing Conference, 2010

Environmental Model Adaptation Based on Histogram Equalization.
IEEE Signal Process. Lett., 2009

The effectiveness of histogram equalization on environmental model adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2009

Text-Independent Speaker Identification using Soft Channel Selection in Home Robot Environments.
IEEE Trans. Consumer Electron., 2008

Histogram Equalization Utilizing Window-Based Smoothed CDF Estimation for Feature Compensation.
IEICE Trans. Inf. Syst., 2008

Utterance Verification Using Word Voiceprint Models Based on Probabilistic Distributions of Phone-Level Log-Likelihood Ratio and Phone Duration.
IEICE Trans. Inf. Syst., 2008

Probabilistic Class Histogram Equalization for Robust Speech Recognition.
IEEE Signal Process. Lett., 2007

Noise Robust Speaker Identification Using Sub-Band Weighting in Multi-Band Approach.
IEICE Trans. Inf. Syst., 2007

Text-Independent Speaker Identification in a Distant-Talking Multi-Microphone Environment.
IEICE Trans. Inf. Syst., 2007

Response Time Reduction of Speech Recognizers Using Single Gaussians.
IEICE Trans. Inf. Syst., 2007

Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition.
EURASIP J. Adv. Signal Process., 2007

Reliable Speaker Identification Using Multiple Microphones in Ubiquitous Robot Companion Environment.
Proceedings of the IEEE RO-MAN 2007, 2007

Soft Counting Poisson Mixture Model-Based Polling Method for Speech/Nonspeech Classification.
IEICE Trans. Inf. Syst., 2006

Frequency Filtering for a Highly Robust Audio Fingerprinting Scheme in a Real-Noise Environment.
IEICE Trans. Inf. Syst., 2006

Intelligent broadcasting system and services for personalized semantic contents consumption.
Expert Syst. Appl., 2006

Composite Decision by Bayesian Inference in Distant-Talking Speech Recognition.
Proceedings of the Text, Speech and Dialogue, 9th International Conference, 2006

A Music Summarization Scheme using Tempo Tracking and Two Stage Clustering.
Proceedings of the IEEE 8th Workshop on Multimedia Signal Processing, 2006

Audio Fingerprinting Scheme by Temporal Filtering for Audio Identification Immune to Channel-Distortion.
Proceedings of the Information Retrieval Technology, 2005

Reliable Unseen Model Prediction for Vocabulary-Independent Speech Recognition.
Proceedings of the AI 2004: Advances in Artificial Intelligence, 2004

A Fast Utterance Verification Method for OOV Rejection.
Proceedings of the Signal and Image Processing (SIP 2003), 2003
