Yashesh Gaur

According to our database1, Yashesh Gaur authored at least 54 papers between 2013 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech.
CoRR, 2024

Speech ReaLLM - Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time.
CoRR, 2024

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning.
CoRR, 2023

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation.
CoRR, 2023

LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CTCBERT: Advancing Hidden-Unit Bert with CTC Objectives.
Proceedings of the IEEE International Conference on Acoustics, 2023

On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition.
CoRR, 2022

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers.
CoRR, 2022

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding.
CoRR, 2022

Streaming, Fast and Accurate on-Device Inverse Text Normalization for Automatic Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.
Proceedings of the IEEE International Conference on Acoustics, 2022

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Dynamic Gradient Aggregation for Federated Domain Adaptation.
CoRR, 2021

Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Listen, Look and Deliberate: Visual Context-Aware Speech Recognition Using Pre-Trained Text-Video Representations.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Speaker-Attributed ASR with Transformer.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Ensemble Combination between Different Time Segmentations.
Proceedings of the IEEE International Conference on Acoustics, 2021

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Federated Transfer Learning with Dynamic Gradient Aggregation.
CoRR, 2020

Combination of End-to-End and Hybrid Models for Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Sequence-Level Self-Learning with Multiple Hypotheses.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Serialized Output Training for End-to-End Overlapped Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Federated Approach in Training Acoustic Models.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Speaker Adaptation for Attention-Based End-to-End Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustic-to-Phrase Models for Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Character-Aware Attention-Based End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Robust Speech Recognition Using Generative Adversarial Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Reducing Bias in Production Speech Models.
CoRR, 2017

Exploring Neural Transducers for End-to-End Speech Recognition.
CoRR, 2017

Exploring neural transducers for end-to-end speech recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
The effects of automatic speech recognition quality on human transcription latency.
Proceedings of the 13th Web for All Conference, 2016

Manipulating Word Lattices to Incorporate Human Corrections.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015
Using keyword spotting to help humans correct captioning faster.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The Effects of Automatic Speech Recognition Quality on Human Transcription Latency.
Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, 2015

2013
Speaker Recognition Using Sparse Representation via Superimposed Features.
Proceedings of the Pattern Recognition and Machine Intelligence, 2013

Algorithms for speech segmentation at syllable-level for text-to-speech synthesis system in Gujarati.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013


  Loading...