We stand with Ukraine

We stand with Ukraine

Nithin Rao Koluguri

According to our database¹, Nithin Rao Koluguri authored at least 21 papers between 2017 and 2024.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR.

[BibT_eX]

[DOI]

,

,

,

,

,

Ivan Medennikov

,

,

Nithin Rao Koluguri

,

Jagadeesh Balam

,

CoRR, 2024

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens.

[BibT_eX]

[DOI]

,

Ivan Medennikov

,

,

,

,

Nithin Rao Koluguri

,

Krishna C. Puvvada

,

Jagadeesh Balam

,

CoRR, 2024

Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation.

[BibT_eX]

[DOI]

Nithin Rao Koluguri

,

Travis M. Bartley

,

,

Oleksii Hrinchuk

,

Jagadeesh Balam

,

,

CoRR, 2024

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks.

[BibT_eX]

[DOI]

,

,

,

Ivan Medennikov

,

Krishna C. Puvvada

,

Nithin Rao Koluguri

,

,

Jagadeesh Balam

,

CoRR, 2024

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations.

[BibT_eX]

[DOI]

,

Nithin Rao Koluguri

,

,

,

Jagadeesh Balam

,

CoRR, 2024

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5.

[BibT_eX]

[DOI]

,

,

Oleksii Hrinchuk

,

Krishna C. Puvvada

,

Nithin Rao Koluguri

,

,

Jagadeesh Balam

,

CoRR, 2024

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data.

[BibT_eX]

[DOI]

Krishna C. Puvvada

,

,

,

Oleksii Hrinchuk

,

Nithin Rao Koluguri

,

,

Somshubra Majumdar

,

Elena Rastorgueva

,

,

Vitaly Lavrukhin

,

Jagadeesh Balam

,

CoRR, 2024

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition.

[BibT_eX]

[DOI]

Krishna C. Puvvada

,

Nithin Rao Koluguri

,

,

Jagadeesh Balam

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach.

[BibT_eX]

[DOI]

,

,

Nithin Rao Koluguri

,

Jagadeesh Balam

Proceedings of the IEEE International Conference on Acoustics, 2024

Investigating End-to-End ASR Architectures for Long Form Audio Transcription.

[BibT_eX]

[DOI]

Nithin Rao Koluguri

,

,

Georgy Zelenfroind

,

Somshubra Majumdar

,

,

,

Jagadeesh Balam

,

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System.

[BibT_eX]

[DOI]

,

,

,

,

Krishna C. Puvvada

,

Nithin Rao Koluguri

,

,

Aleksandr Laptev

,

Jagadeesh Balam

,

CoRR, 2023

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation.

[BibT_eX]

[DOI]

,

,

,

Nithin Rao Koluguri

,

,

,

Jagadeesh Balam

,

CoRR, 2023

A Compact End-to-End Model with Local and Global Context for Spoken Language Identification.

[BibT_eX]

[DOI]

,

Nithin Rao Koluguri

,

Jagadeesh Balam

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition.

[BibT_eX]

[DOI]

,

Nithin Rao Koluguri

,

,

Somshubra Majumdar

,

,

,

Oleksii Hrinchuk

,

Krishna C. Puvvada

,

,

Jagadeesh Balam

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

AmberNet: A Compact End-to-End Model for Spoken Language Identification.

[BibT_eX]

[DOI]

,

Nithin Rao Koluguri

,

Jagadeesh Balam

,

CoRR, 2022

NeMo Open Source Speaker Diarization System.

[BibT_eX]

[DOI]

,

Nithin Rao Koluguri

,

,

Jagadeesh Balam

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-scale Speaker Diarization with Dynamic Scale Weighting.

[BibT_eX]

[DOI]

,

Nithin Rao Koluguri

,

Jagadeesh Balam

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

TitaNet: Neural Model for Speaker Representation with 1D Depth-Wise Separable Convolutions and Global Context.

[BibT_eX]

[DOI]

Nithin Rao Koluguri

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

2020

Meta-Learning for Robust Child-Adult Classification from Speech.

[BibT_eX]

[DOI]

Nithin Rao Koluguri

,

,

,

,

Shrikanth Narayanan

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Comparison of Speech Tasks and Recording Devices for Voice Based Automatic Classification of Healthy Subjects and Patients with Amyotrophic Lateral Sclerosis.

[BibT_eX]

[DOI]

,

,

Nithin Rao Koluguri

,

,

,

Atchayaram Nalini

,

,

,

Prasanta Kumar Ghosh

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2017

Spectrogram Enhancement Using Multiple Window Savitzky-Golay (MWSG) Filter for Robust Bird Sound Detection.

[BibT_eX]

[DOI]

Nithin Rao Koluguri

,

G. Nisha Meenakshi

,

Prasanta Kumar Ghosh

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Loading...