Tom Ko

Orcid: 0000-0002-5324-8961

According to our database¹, Tom Ko authored at least 60 papers between 2008 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent.

[BibT_eX]

[DOI]

CoRR, 2024

PolyVoice: Language Models for Speech to Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Parameter-Efficient Transfer Learning for End-to-end Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

RepCodec: A Speech Representation Codec for Speech Tokenization.

[BibT_eX]

[DOI]

Zhichao Huang

Chutong Meng

Tom Ko

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Selective Prompting Tuning for Personalized Conversations with LLMs.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Speech Translation with Large Language Models: An Industrial Practice.

[BibT_eX]

[DOI]

CoRR, 2023

PolyVoice: Language Models for Speech to Speech Translation.

[BibT_eX]

[DOI]

CoRR, 2023

Findings of the IWSLT 2023 Evaluation Campaign.

[BibT_eX]

[DOI]

Sweta Agrawal

Antonios Anastasopoulos

Alexandra Chronopoulou

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

GigaST: A 10, 000-hour Pseudo Speech Translation Corpus.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Recent Advances in Direct Speech-to-text Translation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

M<sup>3</sup>ST: Mix at Three Levels for Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Learning Retrieval Augmentation for Personalized Dialogue Generation.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Leveraging per Image-Token Consistency for Vision-Language Pre-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DUB: Discrete Unit Back-translation for Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

MOSPC: MOS Prediction Based on Pairwise Comparison.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

CTC-based Non-autoregressive Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Personalized Dialogue Generation with Persona-Adaptive Attention.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

M3ST: Mix at Three Levels for Speech Translation.

[BibT_eX]

[DOI]

CoRR, 2022

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis.

[BibT_eX]

[DOI]

Qibing Bai

Tom Ko

Yu Zhang

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Exploring Machine Speech Chain For Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Multi-View Self-Attention Based Transformer for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing.

[BibT_eX]

[DOI]

CoRR, 2021

Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation.

[BibT_eX]

[DOI]

CoRR, 2021

An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition.

[BibT_eX]

[DOI]

Fengpeng Yue

Tom Ko

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Improving Attention-based End-to-end ASR by Incorporating an N-gram Neural Network.

[BibT_eX]

[DOI]

Junyi Ao

Tom Ko

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Auto-KWS 2021 Challenge: Task, Datasets, and Baselines.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Token-Level Supervised Contrastive Learning for Punctuation Restoration.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Meta-Learning Approach for User-Defined Spoken Term Classification with Varying Classes and Examples.

[BibT_eX]

[DOI]

Yangbin Chen

Tom Ko

Jianping Wang

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

An Encoder-Decoder Based Audio Captioning System with Transfer and Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

CL4AC: A Contrastive Loss for Audio Captioning.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

2020

AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Investigation of Few-Shot Learning in Spoken Term Classification.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

MetaMix: Improved Meta-Learning with Interpolation-based Consistency Regularization.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Prototypical Networks for Small Footprint Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Tom Ko

Yangbin Chen

Qing Li

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Mixup Learning Strategies for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Yingke Zhu

Tom Ko

Brian Mak

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Meta Learning for Few-shot Keyword Spotting.

[BibT_eX]

[DOI]

CoRR, 2018

Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Long Distance Voice Channel Diagnosis Using Deep Neural Networks.

[BibT_eX]

[DOI]

Zhen Qin

Tom Ko

Guangjian Tian

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

A study on data augmentation of reverberant speech for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

An empirical exploration of CTC acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Audio augmentation for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Eigentrigraphemes for under-resourced languages.

[BibT_eX]

[DOI]

Tom Ko

Brian Kan-Wing Mak

Speech Commun., 2014

Modeling inter-cluster and intra-cluster discrimination among triphones.

[BibT_eX]

[DOI]

Tom Ko

Brian Kan-Wing Mak

Dongpeng Chen

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Subspace Gaussian mixture model with state-dependent subspace dimensions.

[BibT_eX]

[DOI]

Tom Ko

Brian Mak

Cheung-Chi Leung

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Eigentriphones for Context-Dependent Acoustic Modeling.

[BibT_eX]

[DOI]

Tom Ko

Brian Mak

IEEE Trans. Speech Audio Process., 2013

2012

Derivation of eigentriphones by weighted principal component analysis.

[BibT_eX]

[DOI]

Tom Ko

Brian Mak

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

A Fully Automated Derivation of State-Based Eigentriphones for Triphone Modeling with No Tied States Using Regularization.

[BibT_eX]

[DOI]

Tom Ko

Brian Mak

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Eigentriphones: A basis for context-dependent acoustic modeling.

[BibT_eX]

[DOI]

Tom Ko

Brian Mak

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Problems of modeling phone deletion in conversational speech for speech recognition.

[BibT_eX]

[DOI]

Brian Mak

Tom Ko

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Improving speech recognition by explicit modeling of phone deletions.

[BibT_eX]

[DOI]

Tom Ko

Brian Mak

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Automatic estimation of decoding parameters using large-margin iterative linear programming.

[BibT_eX]

[DOI]

Brian Mak

Tom Ko

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

2008

Min-max discriminative training of decoding parameters using iterative linear programming.

[BibT_eX]

[DOI]

Brian Mak

Tom Ko

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Tom Ko

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...