Soumi Maiti
Orcid: 0000-0001-6940-0115
According to our database1,
Soumi Maiti
authored at least 33 papers
between 2017 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
2017
2018
2019
2020
2021
2022
2023
2024
0
5
10
7
2
4
9
2
1
2
2
1
2
1
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data.
CoRR, 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages.
CoRR, 2024
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.
CoRR, 2024
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2024
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.
Proceedings of the IEEE International Conference on Acoustics, 2024
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
2023
CoRR, 2023
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023
2022
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
2021
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2021
2020
Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Speaker Independence of Neural Vocoders and Their Effect on Parametric Resynthesis Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
2017
Proceedings of the Advanced Social Interaction with Agents, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017