Hsin-Min Wang
Orcid: 0000-0003-3599-5071Affiliations:
- Academia Sinica, Taipei, Taiwan
- National Taiwan University, Taipei, Taiwan (PhD 1995)
According to our database1,
Hsin-Min Wang
authored at least 352 papers
between 1993 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing.
CoRR, 2024
Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition.
CoRR, 2024
Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation.
CoRR, 2024
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
SpeechCLIP+: Self-Supervised Multi-Task Representation Learning for Speech Via Clip and Speech-Image Data.
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Decomposition and Reorganization of Phonetic Information for Speaker Embedding Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE Signal Process. Lett., 2023
AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection.
CoRR, 2023
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection.
CoRR, 2023
Utilizing Whisper to Enhance Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids.
CoRR, 2023
BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm.
CoRR, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
IEEE Signal Process. Lett., 2022
A Teacher-student Framework for Unsupervised Speech Enhancement Using Noise Remixing Training and Two-stage Inference.
CoRR, 2022
Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN.
CoRR, 2022
Is Character Trigram Overlapping Ratio Still the Best Similarity Measure for Aligning Sentences in a Paraphrased Corpus?
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing, 2022
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Learning to Visualize Music Through Shot Sequence for Automatic Concert Video Mashup.
IEEE Trans. Multim., 2021
CoRR, 2021
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing, 2021
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing, 2021
Investigation of a Single-Channel Frequency-Domain Speech Enhancement Network to Improve End-to-End Bengali Automatic Speech Recognition Under Unseen Noisy Conditions.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021
Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021
MoEVC: A Mixture of Experts Voice Conversion System With Sparse Gating Mechanism for Online Computation Acceleration.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Relational Data Selection for Data Augmentation of Speaker-Dependent Multi-Band MelGAN Vocoder.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Melody Harmonization Using Orderless Nade, Chord Balancing, and Blocked Gibbs Sampling.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the 29th European Signal Processing Conference, 2021
Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Improvement of Spatial Ambiguity in Multi-Channel Speech Separation Using Channel Attention.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
2020
Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion.
IEEE Trans. Emerg. Top. Comput. Intell., 2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End Speech Enhancement.
IEEE Signal Process. Lett., 2020
ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech.
Comput. Speech Lang., 2020
CoRR, 2020
Using Taigi Dramas with Mandarin Chinese Subtitles to Improve Taigi Speech Recognition.
Proceedings of the 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2020
SERIL: Noise Adaptive Speech Enhancement Using Regularization-Based Incremental Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Combining Deep Embeddings of Acoustic and Articulatory Features for Speaker Identification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Statistics Pooling Time Delay Neural Network Based on X-Vector for Speaker Verification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020
2019
MoEVC: A Mixture-of-experts Voice Conversion System with Sparse Gating Mechanism for Accelerating Online Computation.
CoRR, 2019
Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement.
CoRR, 2019
Multichannel Speech Enhancement by Raw Waveform-mapping using Fully Convolutional Networks.
CoRR, 2019
Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing, 2019
Proceedings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2019
Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019
Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the 27th European Signal Processing Conference, 2019
Proceedings of the 27th European Signal Processing Conference, 2019
Spoken Multiple-Choice Question Answering Using Multimodal Convolutional Neural Networks.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Investigation of Neural Network Approaches for Unified Spectral and Prosodic Feature Enhancement.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Sequential Speaker Embedding and Transfer Learning for Text-Independent Speaker Identification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
IEEE Trans. Multim., 2018
IEEE Trans. Emerg. Top. Comput. Intell., 2018
IEEE ACM Trans. Audio Speech Lang. Process., 2018
J. Inf. Sci. Eng., 2018
WaveNet 聲碼器及其於語音轉換之應用 (WaveNet Vocoder and its Applications in Voice Conversion) [In Chinese].
Proceedings of the 30th Conference on Computational Linguistics and Speech Processing, 2018
Automatic Detection of Speech Under Cold Using Discriminative Autoencoders and Strength Modeling with Multiple Sub-Dictionary Generation.
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
2017
Proceedings of the Emotions and Personality in Personalized Services, 2017
A Position-Aware Language Modeling Framework for Extractive Broadcast News Speech Summarization.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2017
Int. J. Comput. Linguistics Chin. Lang. Process., 2017
Int. J. Comput. Linguistics Chin. Lang. Process., 2017
An Empirical Comparison of Contemporary Unsupervised Approaches for Extractive Speech Summarization.
Int. J. Comput. Linguistics Chin. Lang. Process., 2017
Audio-Visual Speech Enhancement based on Multimodal Deep Convolutional Neural Network.
CoRR, 2017
基於鑑別式自編碼解碼器之錄音回放攻擊偵測系統 (A Replay Spoofing Detection System Based on Discriminative Autoencoders) [In Chinese].
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, 2017
使用查詢意向探索與類神經網路於語音文件檢索之研究 (Exploring Query Intent and Neural Network modeling Techniques for Spoken Document Retrieval) [In Chinese].
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, 2017
基於i-vector與PLDA並使用GMM-HMM強制對位之自動語者分段標記系統 (Speaker Diarization based on I-vector PLDA Scoring and using GMM-HMM Forced Alignment) [In Chinese].
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, 2017
Automatic Music Video Generation Based on Simultaneous Soundtrack Recommendation and Video Editing.
Proceedings of the 2017 ACM on Multimedia Conference, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
A Post-Filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech Enhancement.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Exploring the Use of Significant Words Language Modeling for Spoken Document Retrieval.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
A locality-preserving essence vector modeling framework for spoken document retrieval.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
2016
Alignment of Lyrics With Accompanied Singing Audio Based on Acoustic-Phonetic Vowel Likelihood Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Exploring the use of unsupervised query modeling techniques for speech recognition and summarization.
Speech Commun., 2016
運用序列到序列生成架構於重寫式自動摘要(Exploiting Sequence-to-Sequence Generation Framework for Automatic Abstractive Summarization)[In Chinese].
Proceedings of the 28th Conference on Computational Linguistics and Speech Processing, 2016
Automatic Music Video Generation Based on Emotion-Oriented Pseudo Song Prediction and Matching.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016
Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Exploring Word Mover's Distance and Semantic-Aware Embedding Techniques for Extractive Broadcast News Summarization.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Minimization of Regression and Ranking Losses with Shallow Neural Networks on Automatic Sincerity Evaluation.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
DEMV-matchmaker: Emotional temporal course representation and deep similarity matching for automatic music video generation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the COLING 2016, 2016
Exploiting graph regularized nonnegative matrix factorization for extractive speech summarization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
2015
Combining Relevance Language Modeling and Clarity Measure for Extractive Speech Summarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2015
IEEE ACM Trans. Audio Speech Lang. Process., 2015
Extractive Broadcast News Summarization Leveraging Recurrent Neural Network Language Modeling Techniques.
IEEE ACM Trans. Audio Speech Lang. Process., 2015
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2015
IEEE Trans. Affect. Comput., 2015
Int. J. Comput. Linguistics Chin. Lang. Process., 2015
Investigating Modulation Spectrum Factorization Techniques for Robust Speech Recognition.
Int. J. Comput. Linguistics Chin. Lang. Process., 2015
Mandarin Singing Voice Synthesis Based on Harmonic Plus Noise Model and Singing Expression Analysis.
CoRR, 2015
表示法學習技術於節錄式語音文件摘要之研究(A Study on Representation Learning Techniques for Extractive Spoken Document Summarization) [In Chinese].
Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, 2015
調變頻譜分解之改良於強健性語音辨識(Several Refinements of Modulation Spectrum Factorization for Robust Speech Recognition) [In Chinese].
Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, 2015
EMV-matchmaker: Emotional Temporal Course Modeling and Matching for Automatic Music Video Generation.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Incorporating paragraph embeddings and density peaks clustering for spoken document summarization.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
Incorporating proximity information in relevance language modeling for extractive speech summarization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
2014
IEEE Trans. Knowl. Data Eng., 2014
探究新穎語句模型化技術於節錄式語音摘要 (Investigating Novel Sentence Modeling Techniques for Extractive Speech Summarization) [In Chinese].
Proceedings of the 26th Conference on Computational Linguistics and Speech Processing, 2014
Automatic Set List Identification and Song Segmentation for Full-Length Concert Videos.
Proceedings of the 15th International Society for Music Information Retrieval Conference, 2014
Enhanced language modeling for extractive speech summarization with sentence relatedness information.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Ensemble of machine learning algorithms for cognitive and physical speaker load detection.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014
A recurrent neural network language modeling framework for extractive speech summarization.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Effective pseudo-relevance feedback for language modeling in extractive speech summarization.
Proceedings of the IEEE International Conference on Acoustics, 2014
Speaker verification using kernel-based binary classifiers with binary operation derived features.
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Leveraging Effective Query Modeling Techniques for Speech Recognition and Summarization.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014
Emotion recognition of conversational affective speech using temporal course modeling-based error weighted cross-correlation model.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014
2013
改良語句模型技術於節錄式語音摘要之研究 (Improved Sentence Modeling Techniques for Extractive Speech Summarization) [In Chinese].
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing, 2013
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2013
Proceedings of the ACM Multimedia Conference, 2013
Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013
Subspace-based phonotactic language recognition using multivariate dynamic linear models.
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, 2013
2012
Spoken Document Retrieval Leveraging Unsupervised and Supervised Topic Modeling Techniques.
IEICE Trans. Inf. Syst., 2012
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2012
The acoustic emotion gaussians model for emotion-based music annotation and retrieval.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012
Exploring the relationship between categorical and dimensional emotion semantics of music.
Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies, 2012
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 21st International Conference on Pattern Recognition, 2012
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012
2011
IEEE Trans. Multim., 2011
Proceedings of the Advances in Multimedia Modeling, 2011
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011
Learning the Similarity of Audio Music in Bag-of-frames Representation from Tagged Music Data.
Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011
Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
2010
Fast min-hashing indexing and robust spatio-temporal matching for detecting video copies.
ACM Trans. Multim. Comput. Commun. Appl., 2010
Time-Series Linear Search for Video Copies Based on Compact Signature Manipulation and Containment Relation Modeling.
IEEE Trans. Circuits Syst. Video Technol., 2010
BIC-Based Speaker Segmentation Using Divide-and-Conquer Strategies With Application to Speaker Diarization.
IEEE Trans. Speech Audio Process., 2010
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
A Discriminative and Heteroscedastic Linear Feature Transformation for Multiclass Classification.
Proceedings of the 20th International Conference on Pattern Recognition, 2010
Homogeneous segmentation and classifier ensemble for audio tag annotation and retrieval.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010
Proceedings of the International Conference on Image Processing, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
2009
IEEE Trans. Neural Networks, 2009
A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization.
IEEE Trans. Speech Audio Process., 2009
A Comparative Study of Probabilistic Ranking Models for Chinese Spoken Document Summarization.
ACM Trans. Asian Lang. Inf. Process., 2009
Improving the characterization of the alternative hypothesis via minimum verification error training with applications to speaker verification.
Pattern Recognit., 2009
IEICE Trans. Commun., 2009
Comput. Speech Lang., 2009
Comput. Speech Lang., 2009
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2009
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009
2008
Using Kernel Discriminant Analysis to Improve the Characterization of the Alternative Hypothesis for Speaker Verification.
IEEE Trans. Speech Audio Process., 2008
Using the Similarity of Main Melodies to Identify Cover Versions of Popular Songs for Music Document Retrieval.
J. Inf. Sci. Eng., 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
A comparative study of probabilistic ranking models for spoken document summarization.
Proceedings of the IEEE International Conference on Acoustics, 2008
Proceedings of the IEEE International Conference on Acoustics, 2008
2007
Automatic Speaker Clustering Using a Voice Characteristic Reference Space and Maximum Purity Estimation.
IEEE Trans. Speech Audio Process., 2007
Int. J. Speech Technol., 2007
A Novel Characterization of the Alternative Hypothesis Using Kernel Discriminant Analysis for LLR-Based Speaker Verification.
Int. J. Comput. Linguistics Chin. Lang. Process., 2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
A unified probabilistic generative framework for extractive spoken document summarization.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Evolutionary minimum verification error learning of the alternative hypothesis model for LLR-based speaker verification.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007
Proceedings of the IEEE International Conference on Acoustics, 2007
Proceedings of the IEEE International Conference on Acoustics, 2007
Improved Methods for Characterizing the Alternative Hypothesis using Minimum Verification Error Training for LLR-Based Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2007
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
2006
Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals.
IEEE Trans. Speech Audio Process., 2006
An Empirical Study of Word Error Minimization Approaches for Mandarin Large Vocabulary Continuous Speech Recognition.
Int. J. Comput. Linguistics Chin. Lang. Process., 2006
Int. J. Comput. Linguistics Chin. Lang. Process., 2006
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Automatic Construction of Regression Class Tree for MLLR Via Model-Based Hierarchical Clustering.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
On Using Entropy Information to Improve Posterior Probability-Based Confidence Measures.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
A Novel Alternative Hypothesis Characterization Using Kernel Classifiers for LLR-Based Speaker Verification.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Improving the characterization of the alternative hypothesis via kernel discriminant analysis for likelihood ratio-based speaker verification.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006
A Kernel-based Discrimination Framework for Solving Hypothesis Testing Problems with Application to Speaker Verification.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006
On Maximizing the Within-Cluster Homogeneity of Speaker Voice Characteristics For Speech Utterance Clustering.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
Proceedings of the Information Retrieval Technology, 2006
2005
Int. J. Comput. Linguistics Chin. Lang. Process., 2005
On the extraction of vocal-related information to facilitate the management of popular music collections.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2005
Query-By-Example Technique for Retrieving Cover Versions of Popular Songs with Similar Melodies.
Proceedings of the ISMIR 2005, 2005
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
An Efficient Approach to Multimodal Person Identity Verification by Fusing Face and Voice Information.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005
Prototype Systems for Retrieving Polyphonic Objects of Popular Music Based on Query-by-singing/example.
Proceedings of the 3rd International Conference on Digital Archive Technologies, 2005
SoVideo - A Mandarin Chinese Broadcast Retrieval System.
Proceedings of the 3rd International Conference on Digital Archive Technologies, 2005
Clustering Speech Utterances by Speaker Using Eigenvoice-Motivated Vector Space Models.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
Proceedings of the Information Retrieval Technology, 2005
2004
ACM Trans. Asian Lang. Inf. Process., 2004
Int. J. Speech Technol., 2004
A Model-Selection-Based Self-Splitting Gaussian Mixture Learning with Application to Speaker Identification.
EURASIP J. Adv. Signal Process., 2004
Comput. Speech Lang., 2004
Comput. Music. J., 2004
藍芽無線環境下中文語音辨識效能之評估與分析 (Performance Evaluation and Analysis of Mandarin Speech Recognition over Bluetooth Communication Environments) [In Chinese].
Proceedings of the 16th Conference on Computational Linguistics and Speech Processing, 2004
Proceedings of the ISMIR 2004, 2004
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
A maximum entropy approach for integrating semantic information in statistical language models.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Speaker clustering of speech utterances using a voice characteristic reference space.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
2003
Proceedings of the ISMIR 2003, 2003
Automatic singer identification of popular music recordings via estimation and modeling of solo vocal signal.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
A sequential metric-based audio segmentation method via the Bayesian information criterion.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
2002
Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese.
IEEE Trans. Speech Audio Process., 2002
A hierarchical tag-graph search scheme with layered grammar rules for spontaneous speech understanding.
Pattern Recognit. Lett., 2002
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002
2001
Int. J. Comput. Process. Orient. Lang., 2001
Comparison of Word and Subword Indexing Techniques for Mandarin Chinese Spoken Document Retrieval.
Proceedings of the Advances in Multimedia Information Processing, 2001
Proceedings of the First International Conference on Human Language Technology Research, 2001
Comparative analysis for data-driven temporal filters obtained via principal component analysis (PCA) and linear discriminant analysis (LDA) in speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
An HMM/n-gram-based linguistic processing approach for Mandarin spoken document retrieval.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
Proceedings of the IEEE International Conference on Acoustics, 2001
Eigenspace-based maximum a posteriori linear regression for rapid speaker adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2001
2000
Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese.
Speech Commun., 2000
Pattern Recognit. Lett., 2000
J. Am. Soc. Inf. Sci., 2000
Int. J. Pattern Recognit. Artif. Intell., 2000
Int. J. Comput. Process. Orient. Lang., 2000
Initial Experiments On Recognition of Internet-Accessible Compressed Mandarin Speech.
Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000
Automatic metric-based speech segmentation for broadcast news via principal component analysis.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
Retrieval of broadcast news speech in Mandarin Chinese collected in Taiwan using syllable-level statistical characteristics.
Proceedings of the IEEE International Conference on Acoustics, 2000
1999
Automatic selection of phonetically distributed sentence sets for speaker adaptation with application to large vocabulary Mandarin speech recognition.
Comput. Speech Lang., 1999
A New Syllable-based Approach for Retrieving Mandarin Spoken Documents Using Short Speech Queries.
Proceedings of the 12th Research on Computational Linguistics Conference, 1999
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999
1998
Statistical Analysis of Mandarin Acoustic Units and Automatic Extraction of Phonetically Rich Sentences Based Upon a very Large Chinese Text Corpus.
Int. J. Comput. Linguistics Chin. Lang. Process., 1998
Large-Vocabulary Chinese Text/Speech Information Retrieval Using Mandarin Speech Queries.
Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing, 1998
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998
Hierarchical tag-graph search for spontaneous speech understanding in spoken dialog systems.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998
1997
Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data.
IEEE Trans. Speech Audio Process., 1997
Internet Chinese information retrieval using unconstrained Mandarin speech queries based on a client-server architecture and a PAT-tree-based language model.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997
1996
Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units.
Speech Commun., 1996
1995
Fast and accurate continuous speech recognition for Chinese language with very large vocabulary.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995
Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data.
Proceedings of the 1995 International Conference on Acoustics, 1995
1994
Incremental speaker adaptation using phonetically balanced training sentences for Mandarin syllable recognition based on segmental probability models.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994
An initial study on a segmental probability model approach to large-vocabulary continuous Mandarin speech recognition.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994
1993
從中文語料庫中自動選取連續國語語音特性平衡句的方法 (Automatic Selection of Phonetically Rich Sentences from A Chinese Text Corpus) [In Chinese].
Proceedings of Rocling Computational Linguistics Conference VI, 1993
Golden Mandarin (II)-an improved single-chip real-time Mandarin dictation machine for Chinese language with very large vocabulary.
Proceedings of the IEEE International Conference on Acoustics, 1993