Tom Ko

Orcid: 0000-0002-5324-8961

According to our database1, Tom Ko authored at least 60 papers between 2008 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent.
CoRR, 2024

PolyVoice: Language Models for Speech to Speech Translation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Parameter-Efficient Transfer Learning for End-to-end Speech Translation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

RepCodec: A Speech Representation Codec for Speech Tokenization.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Selective Prompting Tuning for Personalized Conversations with LLMs.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Speech Translation with Large Language Models: An Industrial Practice.
CoRR, 2023

PolyVoice: Language Models for Speech to Speech Translation.
CoRR, 2023


GigaST: A 10, 000-hour Pseudo Speech Translation Corpus.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Recent Advances in Direct Speech-to-text Translation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

M<sup>3</sup>ST: Mix at Three Levels for Speech Translation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Learning Retrieval Augmentation for Personalized Dialogue Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Leveraging per Image-Token Consistency for Vision-Language Pre-training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DUB: Discrete Unit Back-translation for Speech Translation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

MOSPC: MOS Prediction Based on Pairwise Comparison.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

CTC-based Non-autoregressive Speech Translation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Personalized Dialogue Generation with Persona-Adaptive Attention.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
M3ST: Mix at Three Levels for Speech Translation.
CoRR, 2022

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Exploring Machine Speech Chain For Domain Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Multi-View Self-Attention Based Transformer for Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing.
CoRR, 2021

Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation.
CoRR, 2021

An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Improving Attention-based End-to-end ASR by Incorporating an N-gram Neural Network.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Auto-KWS 2021 Challenge: Task, Datasets, and Baselines.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Token-Level Supervised Contrastive Learning for Punctuation Restoration.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Meta-Learning Approach for User-Defined Spoken Term Classification with Varying Classes and Examples.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

An Encoder-Decoder Based Audio Captioning System with Transfer and Reinforcement Learning.
Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

CL4AC: A Contrastive Loss for Audio Captioning.
Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

2020
AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Investigation of Few-Shot Learning in Spoken Term Classification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

MetaMix: Improved Meta-Learning with Interpolation-based Consistency Regularization.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Prototypical Networks for Small Footprint Text-Independent Speaker Verification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Mixup Learning Strategies for Text-Independent Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018
Meta Learning for Few-shot Keyword Spotting.
CoRR, 2018

Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Long Distance Voice Channel Diagnosis Using Deep Neural Networks.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017
A study on data augmentation of reverberant speech for robust speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
An empirical exploration of CTC acoustic models.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Audio augmentation for speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Eigentrigraphemes for under-resourced languages.
Speech Commun., 2014

Modeling inter-cluster and intra-cluster discrimination among triphones.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Subspace Gaussian mixture model with state-dependent subspace dimensions.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Eigentriphones for Context-Dependent Acoustic Modeling.
IEEE Trans. Speech Audio Process., 2013

2012
Derivation of eigentriphones by weighted principal component analysis.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
A Fully Automated Derivation of State-Based Eigentriphones for Triphone Modeling with No Tied States Using Regularization.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Eigentriphones: A basis for context-dependent acoustic modeling.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Problems of modeling phone deletion in conversational speech for speech recognition.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Improving speech recognition by explicit modeling of phone deletions.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Automatic estimation of decoding parameters using large-margin iterative linear programming.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

2008
Min-max discriminative training of decoding parameters using iterative linear programming.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008


  Loading...