Rama Sanand Doddipatla

Orcid: 0000-0003-1061-9512

According to our database1, Rama Sanand Doddipatla authored at least 63 papers between 2007 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Entity Resolution in Situated Dialog With Unimodal and Multimodal Transformers.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding.
CoRR, 2024

Geodesic Interpolation of Frame-Wise Speaker Embeddings for the Diarization of Meeting Scenarios.
Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Accented Speech Recognition Using Data Augmentation Based on Unsupervised Text-to-Speech Synthesis.
Proceedings of the 32nd European Signal Processing Conference, 2024

Advancing Faithfulness of Large Language Models in Goal-Oriented Dialogue Question Answering.
Proceedings of the ACM Conversational User Interfaces 2024, 2024

Semantic Map-based Generation of Navigation Instructions.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023
Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues.
CoRR, 2023

Adversarial learning of neural user simulators for dialogue policy optimisation.
CoRR, 2023

Domain Adaptive Self-supervised Training of Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Teacher-Student Approach for Extracting Informative Speaker Embeddings From Speech Mixtures.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

On the Effectiveness of Monoaural Target Source Extraction for Distant end-to-end Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Cumulative Attention Based Streaming Transformer ASR with Internal Language Model Joint Training and Rescoring.
Proceedings of the IEEE International Conference on Acoustics, 2023

Frame-Wise and Overlap-Robust Speaker Embeddings for Meeting Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2023

Enabling Semi-Structured Knowledge Access via a Question-Answering Module in Task-oriented Dialogue Systems.
Proceedings of the 5th International Conference on Conversational User Interfaces, 2023

Towards a Unified End-to-End Language Understanding System for Speech and Text Inputs.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Robust Recognition of Speaker Emotion With Difference Feature Extraction Using a Few Enrollment Utterances.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Factors in Emotion Recognition With Deep Learning Models Using Speech and Text on Multiple Corpora.
IEEE Signal Process. Lett., 2022

Non-Autoregressive End-to-End Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Combining Structured and Unstructured Knowledge in an Interactive Search Dialogue System.
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2022

Monaural Source Separation: From Anechoic To Reverberant Environments.
Proceedings of the 17th International Workshop on Acoustic Signal Enhancement, 2022

On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speaker Reinforcement Using Target Source Extraction for Robust Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Transformer-Based Streaming ASR with Cumulative Attention.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
An Investigation into the Multi-channel Time Domain Speaker Extraction Network.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Transformer-Based Online Speech Recognition with Decoder-end Adaptive Computation Steps.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Towards Handling Unconstrained User Preferences in Dialogue.
Proceedings of the Conversational AI for Natural Human-Centric Interaction, 2021

Teacher-Student MixIT for Unsupervised and Semi-Supervised Speech Separation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2021

Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers.
Proceedings of the IEEE International Conference on Acoustics, 2021

Action State Update Approach to Dialogue Management.
Proceedings of the IEEE International Conference on Acoustics, 2021

Head-Synchronous Decoding for Transformer-Based Streaming ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Multiple-Hypothesis CTC-Based Semi-Supervised Adaptation of End-to-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Improving HS-DACS Based Streaming Transformer ASR with Deep Reinforcement Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Dialogue Strategy Adaptation to New Action Sets Using Multi-Dimensional Modelling.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

A Study on Cross-Corpus Speech Emotion Recognition and Data Augmentation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Learning Noise Invariant Features Through Transfer Learning For Robust End-to-End Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
On Reducing the Effect of Speaker Overlap for Chime-5.
Proceedings of the IEEE International Conference on Acoustics, 2019

An Unsupervised Learning Approach to Neural-net-supported Wpe Dereverberation.
Proceedings of the IEEE International Conference on Acoustics, 2019

An Investigation into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party Transcription.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2017
Speaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016
Speaker adaptive training in deep neural networks using speaker dependent bottleneck features.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
The USFD Spoken Language Translation System for IWSLT 2014.
CoRR, 2015

Noise-matched training of CRF based sentence end detection models.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014
The USFD SLT system for IWSLT 2014.
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2014, 2014

Multi-pass sentence-end detection of lecture speech.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Speaker dependent bottleneck layer training for speaker adaptation in automatic speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013
Non-negative durational HMM.
Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2013

Synthetic speaker models using VTLN to improve the performance of children in mismatched speaker conditions for ASR.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

2012
VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC.
IEEE Trans. Speech Audio Process., 2012

Creating synthetic voices for children by adapting adult average voice using stacked transformations and VTLN.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
A Study on Combining VTLN and SAT to Improve the Performance of Automatic Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010
Revisiting VTLN using linear transformation on conventional MFCC.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009
A study on the influence of covariance adaptation on jacobian compensation in vocal tract length normalization.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Characterizing speaker variability using spectral envelopes of vowel sounds.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Improving the performance of VTLN under mismatched speaker conditions and making it approach that of matched speaker conditions.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Study of jacobian compensation using linear transformation of conventional MFCC for VTLN.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Use of spectral centre of gravity for generating speaker invariant features for automatic speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

2007
Linear transformation approach to VTLN using dynamic frequency warping.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Speaker-Invariant Features for Automatic Speech Recognition.
Proceedings of the IJCAI 2007, 2007


  Loading...