Jiatong Shi

Orcid: 0000-0002-9050-8304

According to our database1, Jiatong Shi authored at least 88 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
A Large-Scale Evaluation of Speech Foundation Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Exploiting Longitudinal Speech Sessions via Voice Assistant Systems for Early Detection of Cognitive Decline.
CoRR, 2024

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech.
CoRR, 2024

FoodPuzzle: Developing Large Language Model Agents as Flavor Scientists.
CoRR, 2024

Preference Alignment Improves Language Model-Based TTS.
CoRR, 2024

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration.
CoRR, 2024

SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge.
CoRR, 2024

Self-supervised Speech Representations Still Struggle with African American Vernacular English.
CoRR, 2024

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction.
CoRR, 2024

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model.
CoRR, 2024

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models.
CoRR, 2024

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation.
CoRR, 2024

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets.
CoRR, 2024

TokSing: Singing Voice Synthesis based on Discrete Tokens.
CoRR, 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units.
CoRR, 2024

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders.
CoRR, 2024

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.
CoRR, 2024

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan.
CoRR, 2024

Wav2Gloss: Generating Interlinear Glossed Text from Speech.
CoRR, 2024

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2.
CoRR, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.
CoRR, 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.
CoRR, 2024

Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model.
Proceedings of the IEEE International Conference on Acoustics, 2024

Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.
Proceedings of the IEEE International Conference on Acoustics, 2024

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Robust Speech Representation Learning for Thousands of Languages.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Wav2Gloss: Generating Interlinear Glossed Text from Speech.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
An iteration-based interactive attention network for 3D point cloud registration.
Neurocomputing, December, 2023

A dynamic graph aggregation framework for 3D point cloud registration.
Eng. Appl. Artif. Intell., April, 2023

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond.
CoRR, 2023

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios.
CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.
CoRR, 2023

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech.
CoRR, 2023

A Systematic Exploration of Joint-training for Singing Voice Synthesis.
CoRR, 2023

The Singing Voice Conversion Challenge 2023.
CoRR, 2023

Exploration on HuBERT with Multiple Resolutions.
CoRR, 2023

Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation.
CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
CoRR, 2023

CMU's IWSLT 2023 Simultaneous Speech Translation System.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023


4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploration on HuBERT with Multiple Resolution.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Phoneix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation With Phoneme Distribution Predictor.
Proceedings of the IEEE International Conference on Acoustics, 2023

Enhancing Speech-To-Speech Translation with Multiple TTS Targets.
Proceedings of the IEEE International Conference on Acoustics, 2023

Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Euro: Espnet Unsupervised ASR Open-Source Toolkit.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Massively Multilingual ASR with Auxiliary CTC Objectives.
Proceedings of the IEEE International Conference on Acoustics, 2023

Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Singing Voice Conversion Challenge 2023.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Evaluating Self-Supervised Speech Models on a Taiwanese Hokkien Corpus.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

UniLG: A Unified Structure-aware Framework for Lyrics Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer.
Comput. Speech Lang., 2022

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis.
CoRR, 2022

On Compressing Sequences for Self-Supervised Speech Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

CMU's IWSLT 2022 Dialect Speech Translation System.
Proceedings of the 19th International Conference on Spoken Language Translation, 2022


Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.
Proceedings of the IEEE International Conference on Acoustics, 2022

Training Strategies for Automatic Song Writing: A Unified Framework Perspective.
Proceedings of the IEEE International Conference on Acoustics, 2022

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Leveraging deep learning with audio analytics to predict the success of crowdfunding projects.
J. Supercomput., 2021

ESPnet2-TTS: Extending the Edge of TTS Research.
CoRR, 2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System.
Proceedings of the 18th International Conference on Spoken Language Translation, 2021

SUPERB: Speech Processing Universal PERformance Benchmark.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Sequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy Loss.
Proceedings of the IEEE International Conference on Acoustics, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Cross-Lingual Transfer for Speech Processing Using Acoustic Language Similarity.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Context-Aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2018
Identifying Impact Factors of Question Quality in Online Health Q&A Communities: an Empirical Analysis on MedHelp.
Proceedings of the 22nd Pacific Asia Conference on Information Systems, 2018


  Loading...