Jiatong Shi

Orcid: 0000-0002-9050-8304

According to our database¹, Jiatong Shi authored at least 73 papers between 2018 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

A Large-Scale Evaluation of Speech Foundation Models.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model.

[BibT_eX]

[DOI]

CoRR, 2024

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models.

[BibT_eX]

[DOI]

CoRR, 2024

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation.

[BibT_eX]

[DOI]

CoRR, 2024

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets.

[BibT_eX]

[DOI]

Vanya Bannihatti Kumar

CoRR, 2024

TokSing: Singing Voice Synthesis based on Discrete Tokens.

[BibT_eX]

[DOI]

CoRR, 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units.

[BibT_eX]

[DOI]

CoRR, 2024

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan.

[BibT_eX]

[DOI]

CoRR, 2024

Wav2Gloss: Generating Interlinear Glossed Text from Speech.

[BibT_eX]

[DOI]

Taiqi He

Kwanghee Choi

Lindia Tjuatja

Nathaniel R. Robinson

CoRR, 2024

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2.

[BibT_eX]

[DOI]

CoRR, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[BibT_eX]

[DOI]

Ahmed Hussen Abdelaziz

Shinji Watanabe

CoRR, 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.

[BibT_eX]

[DOI]

CoRR, 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

An iteration-based interactive attention network for 3D point cloud registration.

[BibT_eX]

[DOI]

Neurocomputing, December, 2023

A dynamic graph aggregation framework for 3D point cloud registration.

[BibT_eX]

[DOI]

Feilong Cao

Jiatong Shi

Chenglin Wen

Eng. Appl. Artif. Intell., April, 2023

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond.

[BibT_eX]

[DOI]

CoRR, 2023

HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model.

[BibT_eX]

[DOI]

CoRR, 2023

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction.

[BibT_eX]

[DOI]

CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.

[BibT_eX]

[DOI]

CoRR, 2023

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech.

[BibT_eX]

[DOI]

CoRR, 2023

A Systematic Exploration of Joint-training for Singing Voice Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

The Singing Voice Conversion Challenge 2023.

[BibT_eX]

[DOI]

Wen-Chin Huang

Lester Phillip Violeta

CoRR, 2023

Exploration on HuBERT with Multiple Resolutions.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation.

[BibT_eX]

[DOI]

CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

CoRR, 2023

CMU's IWSLT 2023 Simultaneous Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Findings of the IWSLT 2023 Evaluation Campaign.

[BibT_eX]

[DOI]

Sweta Agrawal

Antonios Anastasopoulos

Alexandra Chronopoulou

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploration on HuBERT with Multiple Resolution.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Phoneix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation With Phoneme Distribution Predictor.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Enhancing Speech-To-Speech Translation with Multiple TTS Targets.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Euro: Espnet Unsupervised ASR Open-Source Toolkit.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Massively Multilingual ASR with Auxiliary CTC Objectives.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Singing Voice Conversion Challenge 2023.

[BibT_eX]

[DOI]

Wen-Chin Huang

Lester Phillip Violeta

Songxiang Liu

Jiatong Shi

Tomoki Toda

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Evaluating Self-Supervised Speech Models on a Taiwanese Hokkien Corpus.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

UniLG: A Unified Structure-aware Framework for Lyrics Generation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2022

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

On Compressing Sequences for Self-Supervised Speech Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

CMU's IWSLT 2022 Dialect Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Spoken Language Translation, 2022

Findings of the IWSLT 2022 Evaluation Campaign.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Spoken Language Translation, 2022

Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation.

[BibT_eX]

[DOI]

Dan Berrebbi

Jiatong Shi

Brian Yan

Osbel López-Francisco

Jonathan D. Amith

Shinji Watanabe

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Training Strategies for Automatic Song Writing: A Unified Framework Perspective.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Leveraging deep learning with audio analytics to predict the success of crowdfunding projects.

[BibT_eX]

[DOI]

J. Supercomput., 2021

ESPnet2-TTS: Extending the Edge of TTS Research.

[BibT_eX]

[DOI]

CoRR, 2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Spoken Language Translation, 2021

SUPERB: Speech Processing Universal PERformance Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Sequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec.

[BibT_eX]

[DOI]

Jiatong Shi

Jonathan D. Amith

Rey Castillo García

Esteban Guadalupe Sierra

Kevin Duh

Shinji Watanabe

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Cross-Lingual Transfer for Speech Processing Using Acoustic Language Similarity.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks.

[BibT_eX]

[DOI]

Louis-Philippe Morency

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Context-Aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training.

[BibT_eX]

[DOI]

Jiatong Shi

Nan Huo

Qin Jin

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2018

Identifying Impact Factors of Question Quality in Online Health Q&A Communities: an Empirical Analysis on MedHelp.

[BibT_eX]

[DOI]

Jiatong Shi

Wei Du

Wei Xu

Proceedings of the 22nd Pacific Asia Conference on Information Systems, 2018

Jiatong Shi

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...