Mitesh M. Khapra

Orcid: 0009-0008-3687-9922

Affiliations:
  • Indian Institute of Technology Madras, India


According to our database1, Mitesh M. Khapra authored at least 132 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams.
CoRR, 2024

Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs.
CoRR, 2024

IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS.
CoRR, 2024

Empowering Low-Resource Language ASR via Large-Scale Pseudo Labeling.
CoRR, 2024

LAHAJA: A Robust Multi-accent Benchmark for Evaluating Hindi ASR Systems.
CoRR, 2024

Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings.
CoRR, 2024

Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies.
CoRR, 2024

An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models.
CoRR, 2024

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages.
CoRR, 2024

Airavata: Introducing Hindi Instruction-tuned LLM.
CoRR, 2024

An Empirical Analysis of In-context Learning Abilities of LLMs for MT.
CoRR, 2024

Finding Blind Spots in Evaluator LLMs with Interpretable Checklists.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

A Comprehensive Analysis of Adapter Efficiency.
Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD), 2024

How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024


2023
IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages.
Trans. Mach. Learn. Res., 2023

A Survey of Evaluation Metrics Used for NLG Systems.
ACM Comput. Surv., 2023

A Survey of Adversarial Defenses and Robustness in NLP.
ACM Comput. Surv., 2023

Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages.
CoRR, 2023

Svarah: Evaluating English ASR Systems on Indian Accents.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Towards Building Text-to-Speech Systems for the Next Billion Users.
Proceedings of the IEEE International Conference on Acoustics, 2023

Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages.
Proceedings of the IEEE International Conference on Acoustics, 2023

Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Bhasa-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian Languages.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Scaling Graph Propagation Kernels for Predictive Learning.
Frontiers Big Data, 2022

IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages.
CoRR, 2022

Aksharantar: Towards building open transliteration tools for the next billion users.
CoRR, 2022

A Survey in Adversarial Defences and Robustness in NLP.
CoRR, 2022

IndicNLG Suite: Multilingual Datasets for Diverse NLG Tasks in Indic Languages.
CoRR, 2022

Addressing Resource Scarcity across Sign Languages with Multilingual Pretraining and Unified-Vocabulary Datasets.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

IndicBART: A Pre-trained Model for Indic Natural Language Generation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Input-specific Attention Subnetworks for Adversarial Detection.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Towards Building ASR Systems for the Next Billion Users.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
A Framework for Rationale Extraction for Deep QA models.
CoRR, 2021

On the Prunability of Attention Heads in Multilingual BERT.
CoRR, 2021

IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages.
CoRR, 2021

A Primer on Pretrained Multilingual Language Models.
CoRR, 2021

Unsupervised Deep Video Denoising.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Perturbation CheckLists for Evaluating NLG Evaluation Metrics.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

A Joint Training Framework for Open-World Knowledge Graph Embeddings.
Proceedings of the 3rd Conference on Automated Knowledge Base Construction, 2021

The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

A Systematic Evaluation of Object Detection Networks for Scientific Plots.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining.
Trans. Assoc. Comput. Linguistics, 2020

Evaluating a Generative Adversarial Framework for Information Retrieval.
CoRR, 2020

On the Importance of Local Information in Transformer Based Models.
CoRR, 2020

On Incorporating Structural Information to improve Dialogue Response Generation.
CoRR, 2020

AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages.
CoRR, 2020

PlotQA: Reasoning over Scientific Plots.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

INCLUDE: A Large Scale Dataset for Indian Sign Language Recognition.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Towards Interpreting BERT for Reading Comprehension Based QA.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

On the weak link between importance and prunability of attention heads.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Joint Transformer/RNN Architecture for Gesture Typing in Indic Languages.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Towards Transparent and Explainable Attention Models.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2019

Graph Convolutional Network with Sequential Attention for Goal-Oriented Dialogue Systems.
Trans. Assoc. Comput. Linguistics, 2019

Studying the plasticity in deep convolutional neural networks using random pruning.
Mach. Vis. Appl., 2019

Scene Graph based Image Retrieval - A case study on the CLEVR Dataset.
CoRR, 2019

Data Interpretation over Plots.
CoRR, 2019

Frustratingly Poor Performance of Reading Comprehension Models on Non-adversarial Examples.
CoRR, 2019

On Knowledge distillation from complex networks for response prediction.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

FigureNet : A Deep Learning model for Question-Answering on Scientific Plots.
Proceedings of the International Joint Conference on Neural Networks, 2019

Let's Ask Again: Refine Network for Automatic Question Generation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Efficient Video Classification Using Fewer Frames.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Re-Evaluating ADEM: A Deeper Look at Scoring Dialogue Responses.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Leveraging Orthographic Similarity for Multilingual Neural Transliteration.
Trans. Assoc. Comput. Linguistics, 2018

A Question-Answering framework for plots using Deep learning.
CoRR, 2018

Fusion Graph Convolutional Networks.
CoRR, 2018

HOPF: Higher Order Propagation Framework for Deep Collective Classification.
CoRR, 2018

Learning Disentangled Multimodal Representations for the Fashion Domain.
Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks.
Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

On Controllable Sparse Alternatives to Softmax.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Generating Descriptions from Structured Data Using a Bifocal Attention Mechanism and Gated Orthogonalization.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

A Mixed Hierarchical Attention Based Encoder-Decoder Approach for Standard Table Summarization.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice Questions.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Towards a Better Metric for Evaluating Question Generation Systems.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Towards Exploiting Background Knowledge for Building Conversation Systems.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

A Dataset for Building Code-Mixed Goal Oriented Conversation Systems.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Towards Building Large Scale Multimodal Domain-Aware Conversation Systems.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Multimodal Dialogs (MMD): A large-scale dataset for studying multimodal domain-aware conversations.
CoRR, 2017

A Concept Driven Graph Based Approach for Estimating the Focus Time of a Document.
Proceedings of the Mining Intelligence and Knowledge Exploration, 2017

Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain.
Proceedings of the 5th International Conference on Learning Representations, 2017

Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

Diversity driven attention model for query-based abstractive summarization.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
Correlational Neural Networks.
Neural Comput., 2016

Sharing Network Parameters for Crosslingual Named Entity Recognition.
CoRR, 2016

Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning.
Proceedings of the NAACL HLT 2016, 2016

Multilingual Multimodal Language Processing Using Neural Networks.
Proceedings of the Tutorial Abstracts, 2016

Statistical Machine Translation between Related Languages.
Proceedings of the Tutorial Abstracts, 2016

Substring-based unsupervised transliteration with phonetic and contextual knowledge.
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2016

A Correlational Encoder Decoder Architecture for Pivot Based Sequence Generation.
Proceedings of the COLING 2016, 2016

2015
ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources.
CoRR, 2015

Show Me Your Evidence - an Automatic Method for Context Dependent Evidence Detection.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

2014
An Autoencoder Approach to Learning Bilingual Word Representations.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Claims on demand - an initial demonstration of a system for automatic detection and polarity identification of context dependent claims in massive corpora.
Proceedings of the COLING 2014, 2014

2013
Offering language based services on social media by identifying user's preferred language(s) from romanized text.
Proceedings of the 22nd International World Wide Web Conference, 2013

Improving reordering performance using higher order and structural features.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Lost in Translation: Viability of Machine Translation for Cross Language Sentiment Analysis.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2013

Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012
Experiences in Resource Generation for Machine Translation through Crowdsourcing.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Report of the Shared Task on Learning Reordering from Word Alignments at RSMT 2012.
Proceedings of the Workshop on Reordering for Statistical Machine Translation@COLING 2012, 2012

Whitepaper for Shared Task on Learning Reordering from Word Alignments at RSMT 2012.
Proceedings of the Workshop on Reordering for Statistical Machine Translation@COLING 2012, 2012

I Can Sense It: a Comprehensive Online System for WSD.
Proceedings of the COLING 2012, 2012

2011
It Takes Two to Tango: A Bilingual Unsupervised Approach for Estimating Sense Distributions using Expectation Maximization.
Proceedings of the Fifth International Joint Conference on Natural Language Processing, 2011

Together We Can: Bilingual Bootstrapping for WSD.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011

2010
Compositional Machine Transliteration.
ACM Trans. Asian Lang. Inf. Process., 2010

OWNS: Cross-lingual Word Sense Disambiguation Using Weighted Overlap Counts and Wordnet Based Similarity Measures.
Proceedings of the 5th International Workshop on Semantic Evaluation, 2010

CFILT: Resource Conscious Approaches for All-Words Domain Specific WSD.
Proceedings of the 5th International Workshop on Semantic Evaluation, 2010

Improving the Multilingual User Experience of Wikipedia Using Cross-Language Name Search.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2010

Everybody loves a rich cousin: An empirical study of transliteration through bridge languages.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2010

Transliteration Equivalence Using Canonical Correlation Analysis.
Proceedings of the Advances in Information Retrieval, 2010

Value for Money: Balancing Annotation Effort, Lexicon Building and Accuracy for Multilingual WSD.
Proceedings of the COLING 2010, 2010

Verbs are where all the action lies: Experiences of Shallow Parsing of a Morphologically Rich Language.
Proceedings of the COLING 2010, 2010

Whitepaper of NEWS 2010 Shared Task on Transliteration Mining.
Proceedings of the 2010 Named Entities Workshop, 2010

Report of NEWS 2010 Transliteration Mining Shared Task.
Proceedings of the 2010 Named Entities Workshop, 2010

All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision.
Proceedings of the ACL 2010, 2010

PR + RQ ALMOST EQUAL TO PQ: Transliteration Mining Using Bridge Language.
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010

2009
Projecting Parameters for Multilingual Word Sense Disambiguation.
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009

Improving Transliteration Accuracy Using Word-Origin Detection and Lexicon Lookup.
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, 2009


  Loading...