Soumi Maiti

Orcid: 0000-0001-6940-0115

According to our database¹, Soumi Maiti authored at least 33 papers between 2017 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

2017

2018

2019

2020

2021

2022

2023

2024

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

Text-To-Speech Synthesis In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data.

[BibT_eX]

[DOI]

CoRR, 2024

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages.

[BibT_eX]

[DOI]

CoRR, 2024

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition.

[BibT_eX]

[DOI]

CoRR, 2024

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.

[BibT_eX]

[DOI]

CoRR, 2024

VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Robust Speech Representation Learning for Thousands of Languages.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech.

[BibT_eX]

[DOI]

CoRR, 2023

Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study.

[BibT_eX]

[DOI]

CoRR, 2023

CMU's IWSLT 2023 Simultaneous Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Speechlmscore: Evaluating Speech Generation Using Speech Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

FindAdaptNet: Find and Insert Adapters by Learned Layer Importance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Massively Multilingual ASR with Auxiliary CTC Objectives.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

2022

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

TriniTTS: Pitch-controllable End-to-end TTS without External Aligner.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Speech Enhancement Using Speech Synthesis Techniques.

[BibT_eX]

[DOI]

Soumi Maiti

PhD thesis, 2021

End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data.

[BibT_eX]

[DOI]

Soumi Maiti

Erik Marchi

Alistair Conkie

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Independence of Neural Vocoders and Their Effect on Parametric Resynthesis Speech Enhancement.

[BibT_eX]

[DOI]

Soumi Maiti

Michael I. Mandel

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Parametric Resynthesis With Neural Vocoders.

[BibT_eX]

[DOI]

Soumi Maiti

Michael I. Mandel

Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

Speech Denoising by Parametric Resynthesis.

[BibT_eX]

[DOI]

Soumi Maiti

Michael I. Mandel

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Large Vocabulary Concatenative Resynthesis.

[BibT_eX]

[DOI]

Soumi Maiti

Joey Ching

Michael I. Mandel

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Predicting Interaction Quality in Customer Service Dialogs.

[BibT_eX]

[DOI]

Svetlana Stoyanchev

Soumi Maiti

Srinivas Bangalore

Proceedings of the Advanced Social Interaction with Agents, 2017

Concatenative Resynthesis Using Twin Networks.

[BibT_eX]

[DOI]

Soumi Maiti

Michael I. Mandel

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Soumi Maiti

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...