Monojit Choudhury

Orcid: 0000-0001-7473-7839

According to our database1, Monojit Choudhury authored at least 148 papers between 2003 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI.
Lang. Resour. Evaluation, June, 2024

[WIP] Jailbreak Paradox: The Achilles' Heel of LLMs.
CoRR, 2024

Benchmark Underestimates the Readiness of Multi-lingual Dialogue Agents.
CoRR, 2024

From Human Judgements to Predictive Models: Unravelling Acceptability in Code-Mixed Sentences.
CoRR, 2024

Towards Measuring and Modeling "Culture" in LLMs: A Survey.
CoRR, 2024

The Zeno's Paradox of 'Low-Resource' Languages.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Towards Measuring and Modeling "Culture" in LLMs: A Survey.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

INMT-Lite: Accelerating Low-Resource Language Data Collection via Offline Interactive Neural Machine Translation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language We Prompt Them in.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Evaluating Large Language Models for Health-related Queries with Presuppositions.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Probing the Moral Development of Large Language Models through Defining Issues Test.
CoRR, 2023

Tricking LLMs into Disobedience: Understanding, Analyzing, and Preventing Jailbreaks.
CoRR, 2023

LLM-powered Data Augmentation for Enhanced Crosslingual Performance.
CoRR, 2023

DUBLIN - Document Understanding By Language-Image Network.
CoRR, 2023

Prover: Generating Intermediate Steps for NLI with Commonsense Knowledge Retrieval and Next-Step Prediction.
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023

LLM-powered Data Augmentation for Enhanced Cross-lingual Performance.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Ethical Reasoning over Moral Alignment: A Case and Framework for In-Context Ethical Policies in LLMs.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

DUBLIN: Visual Document Understanding By Language-Image Network.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023, 2023

Performance and Risk Trade-offs for Multi-word Text Prediction at Scale.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Fairness in Language Models Beyond English: Gaps and Challenges.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Conceptualizing Indigeneity in Social Computing.
Proceedings of the Computer Supported Cooperative Work and Social Computing, 2023

Everything you need to know about Multilingual LLMs: Towards fair, performant and reliable models for languages of the world.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, 2023

X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models.
CoRR, 2022

Generating Intermediate Steps for NLI with Next-Step Supervision.
CoRR, 2022

Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages.
CoRR, 2022

Global Readiness of Language Technology for Healthcare: What would it Take to Combat the Next Pandemic?
CoRR, 2022

NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis.
CoRR, 2022

Too Brittle to Touch: Comparing the Stability of Quantization and Distillation towards Developing Low-Resource MT Models.
Proceedings of the Seventh Conference on Machine Translation, 2022

"Diversity and Uncertainty in Moderation" are the Key to Data Selection for Multilingual Few-shot Transfer.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

On the Economics of Multilingual Few-shot Learning: Modeling the Cost-Performance Trade-offs of Machine Translated and Manual Data.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Language Patterns and Behaviour of the Peer Supporters in Multilingual Healthcare Conversational Forums.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Multilingual CheckList: Generation and Evaluation.
Proceedings of the Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, 2022

Vector Space Interpolation for Query Expansion.
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022

On the Calibration of Massively Multilingual Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

The Six Conundrums of Building and Deploying Language Technologies for Social Good.
Proceedings of the COMPASS '22: ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies, Seattle, WA, USA, 29 June 2022, 2022

Global Readiness of Language Technology for Healthcare: What Would It Take to Combat the Next Pandemic?
Proceedings of the 29th International Conference on Computational Linguistics, 2022

SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

LITMUS Predictor: An AI Assistant for Building Reliable, High-Performing and Fair Multilingual NLP Systems.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Predicting the Performance of Multilingual NLP Models.
CoRR, 2021

Designing Language Technologies for Social Good: The Road not Taken.
CoRR, 2021

Analyzing the Effects of Reasoning Types on Cross-Lingual Transfer Performance.
CoRR, 2021

Trusting RoBERTa over BERT: Insights from CheckListing the Natural Language Inference Task.
CoRR, 2021

Sample-efficient Linguistic Generalizations through Program Synthesis: Experiments with Phonology Problems.
CoRR, 2021

American Politicians Diverge Systematically, Indian Politicians do so Chaotically: Text Embeddings as a Window into Party Polarization.
Proceedings of the Fifteenth International AAAI Conference on Web and Social Media, 2021

Stress Rules from Surface Forms: Experiments with Program Synthesis.
Proceedings of the 18th International Conference on Natural Language Processing (ICON 2021), National Institute of Technology Silchar, Silchar, India, December 16, 2021

On the Universality of Deep Contextual Language Models.
Proceedings of the 18th International Conference on Natural Language Processing (ICON 2021), National Institute of Technology Silchar, Silchar, India, December 16, 2021

GCM: A Toolkit for Generating Synthetic Code-mixed Text.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2021

Language Translation as a Socio-Technical System: Case-Studies of Mixed-Initiative Interactions.
Proceedings of the COMPASS '21: ACM SIGCAS Conference on Computing and Sustainable Societies, Virtual Event, Australia, 28 June 2021, 2021

Comparing Grammatical Theories of Code-Mixing.
Proceedings of the Seventh Workshop on Noisy User-generated Text, 2021

Use of Formal Ethical Reviews in NLP Literature: Historical Trends and Current Practices.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

How Linguistically Fair Are Multilingual Pre-Trained Language Models?
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
MSIR@FIRE: A Comprehensive Report from 2013 to 2016.
SN Comput. Sci., 2020

Topical Focus of Political Campaigns and its Impact: Findings from Politicians' Hashtag Use during the 2019 Indian Elections.
Proc. ACM Hum. Comput. Interact., 2020

Do Multilingual Users Prefer Chat-bots that Code-mix? Let's Nudge and Find Out!
Proc. ACM Hum. Comput. Interact., 2020

Crowdsourcing Speech Data for Low-Resource Languages from Low-Income Workers.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Engagement Patterns of Peer-to-Peer Interactions on Mental Health Platforms.
Proceedings of the Fourteenth International AAAI Conference on Web and Social Media, 2020

TaxiNLI: Taking a Ride up the NLU Hill.
Proceedings of the 24th Conference on Computational Natural Language Learning, 2020

GLUECoS: An Evaluation Benchmark for Code-Switched NLP.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

The State and Fate of Linguistic Diversity and Inclusion in the NLP World.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Understanding Script-Mixing: A Case Study of Hindi-English Bilingual Twitter Users.
Proceedings of the The 4th Workshop on Computational Approaches to Code Switching, 2020

Code-mixed parse trees and how to find them.
Proceedings of the The 4th Workshop on Computational Approaches to Code Switching, 2020

A New Dataset for Natural Language Inference from Code-mixed Conversations.
Proceedings of the The 4th Workshop on Computational Approaches to Code Switching, 2020

2019
Identifying and Analyzing Different Aspects of English-Hindi Code-Switching in Twitter.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2019

Unsung Challenges of Building and Deploying Language Technologies for Low Resource Language Communities.
CoRR, 2019

Characterizing the Spread of Exaggerated Health News Content over Social Media.
Proceedings of the 30th ACM Conference on Hypertext and Social Media, 2019

INMT: Interactive Neural Machine Translation Prediction.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Processing and Understanding Mixed Language Data.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018
Characterizing the spread of exaggerated news content over social media.
CoRR, 2018

Discovering Canonical Indian English Accents: A Crowdsourcing-based Approach.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

An Integrated Representation of Linguistic and Social Functions of Code-Switching.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

User Perception of Code-Switching Dialog Systems.
Proceedings of the 15th International Conference on Natural Language Processing, 2018

Word Embeddings for Code-Mixed Language Processing.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Phone Merging For Code-Switched Speech Recognition.
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching@ACL 2018, 2018

Accommodation of Conversational Code-Choice.
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching@ACL 2018, 2018

2017
Is this word borrowed? An automatic approach to quantify the likeliness of borrowing in social media.
CoRR, 2017

Quantitative Characterization of Code Switching Patterns in Complex Multi-Party Conversations: A Case Study on Hindi Movie Scripts.
Proceedings of the 14th International Conference on Natural Language Processing, 2017

Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks.
Proceedings of the 14th International Conference on Natural Language Processing, 2017

Overview of the FIRE 2017 track: Information Retrieval from Microblogs during Disasters (IRMiDis).
Proceedings of the Working notes of FIRE 2017, 2017

All that is English may be Hindi: Enhancing language identification through automatic ranking of the likeliness of word borrowing in social media.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

I may talk in English but gaali toh Hindi mein hi denge : A study of English-Hindi code-switching and swearing pattern on social networks.
Proceedings of the 9th International Conference on Communication Systems and Networks, 2017

Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
Syntactic complexity of Web search queries through the lenses of language models, networks and users.
Inf. Process. Manag., 2016

Grammatical Constraints on Intra-sentential Code-Switching: From Theories to Working Models.
CoRR, 2016

Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Overview of the Mixed Script Information Retrieval (MSIR) at FIRE-2016.
Proceedings of the Text Processing, 2016

Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers do on Twitter?
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Improving Document Ranking for Long Queries with Nested Query Segmentation.
Proceedings of the Advances in Information Retrieval, 2016

2015
Discovering and understanding word level user intent in Web search queries.
J. Web Semant., 2015

POS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments.
Proceedings of the 12th International Conference on Natural Language Processing, 2015

Overview of FIRE-2015 Shared Task on Mixed Script Information Retrieval.
Proceedings of the Post Proceedings of the Workshops at the 7th Forum for Information Retrieval Evaluation, 2015

2014
Improving unsupervised query segmentation using parts-of-speech sequence information.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Query expansion for mixed-script information retrieval.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Hierarchical Recursive Tagset for Annotating Cooking Recipes.
Proceedings of the 11th International Conference on Natural Language Processing, 2014

"ye word kis lang ka hai bhai?" Testing the Limits of Word level Language Identification.
Proceedings of the 11th International Conference on Natural Language Processing, 2014

POS Tagging of English-Hindi Code-Mixed Social Media Content.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

Automatic Discovery of Adposition Typology.
Proceedings of the COLING 2014, 2014

Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System.
Proceedings of the First Workshop on Computational Approaches to Code Switching@EMNLP 2014, 2014

"I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook.
Proceedings of the First Workshop on Computational Approaches to Code Switching@EMNLP 2014, 2014

2013
Language Dynamics in the Framework of Complex Networks: A Case Study on Self-Organization of the Consonant Inventories.
Proceedings of the Cognitive Aspects of Computational Language Acquisition, 2013

Place value: word position shifts vital to search dynamics.
Proceedings of the 22nd International World Wide Web Conference, 2013

Automatically Identifying Vocal Expressions for Music Transcription.
Proceedings of the 14th International Society for Music Information Retrieval Conference, 2013

The Use Of Melodic Scales In Bollywood Music: An Empirical Study.
Proceedings of the 14th International Society for Music Information Retrieval Conference, 2013

Overview of the FIRE 2013 Track on Transliterated Search.
Proceedings of the 5th 2013 Forum on Information Retrieval Evaluation, 2013

Entailment: An Effective Metric for Comparing and Evaluating Hierarchical and Non-hierarchical Annotation Schemes.
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, 2013

Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012
An IR-based evaluation framework for web search query segmentation.
Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, 2012

An Empirical Study of the Occurrence and Co-Occurrence of Named Entities in Natural Language Corpora.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Can Modern Statistical Parsers Lead to Better Natural Language Understanding for Education?
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2012

2011
Network based models of cognitive and social dynamics of human languages.
Comput. Speech Lang., 2011

Unsupervised query segmentation using only query logs.
Proceedings of the 20th International Conference on World Wide Web, 2011

Query completion without query logs for song search.
Proceedings of the 20th International Conference on World Wide Web, 2011

Challenges in Designing Input Method Editors for Indian Lan-guages: The Role of Word-Origin and Context.
Proceedings of the Workshop on Advances in Text Input Methods, 2011

2010
Modelling the Redundancy of Human Speech Sound Inventories: An Information Theoretic Approach.
J. Quant. Linguistics, 2010

Resource Creation for Training and Testing of Transliteration Systems for Indian Languages.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

Global topology of word co-occurrence networks: Beyond the two-regime power-law.
Proceedings of the COLING 2010, 2010

2009
Self-organization of the Sound Inventories: Analysis and Synthesis of the Occurrence and Co-occurrence Networks of Consonants.
J. Quant. Linguistics, 2009

Language Diversity across the Consonant Inventories: A Study in the Framework of Complex Networks
CoRR, 2009

Discovering Global Patterns in Linguistic Networks through Spectral Analysis: A Case Study of the Consonant Inventories.
Proceedings of the EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30, 2009

Large-Coverage Root Lexicon Extraction for Hindi.
Proceedings of the EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30, 2009

Complex Linguistic Annotation - No Easy Way Out! A Case from Bangla and Hindi POS Labeling Tasks.
Proceedings of the Third Linguistic Annotation Workshop, 2009

Syntax is from Mars while Semantics from Venus! Insights from Spectral Analysis of Distributional Similarity Networks.
Proceedings of the ACL 2009, 2009

2008
Automatic request categorization in internet services.
SIGMETRICS Perform. Evaluation Rev., 2008

Rediscovering the Co-Occurrence Principles of vowel inventories: a Complex Network Approach.
Adv. Complex Syst., 2008

A Common Parts-of-Speech Tagset Framework for Indian Languages.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

Unsupervised Parts-of-Speech Induction for Bengali.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

Social Network Inspired Models of NLP and Language Evolution.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008

Invited Talk: Breaking the Zipfian Barrier of NLP.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008

Modeling the Structure and Dynamics of the Consonant Inventories: A Complex Network Approach.
Proceedings of the COLING 2008, 2008

2007
Investigation and modeling of the structure of texting language.
Int. J. Document Anal. Recognit., 2007

Emergence of Community Structures in Vowel Inventories: An Analysis Based on Complex Networks.
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, 2007

Evolution, Optimization, and Language Change: The Case of Bengali Verb Inflections.
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, 2007

Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World's Languages.
Proceedings of the ACL 2007, 2007

2006
Multi-Agent Simulation of Emergence of Schwa Deletion Pattern in Hindi.
J. Artif. Soc. Soc. Simul., 2006

Shruti: an embedded text-to-speech system for Indian languages.
IEE Proc. Softw., 2006

Battery-aware code partitioning for a text to speech system.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Analysis and Synthesis of the Distribution of Consonants over Languages: A Complex Network Approach.
Proceedings of the ACL 2006, 2006

2004
A Diachronic Approach for Schwa Deletion in Indo Aryan Languages.
Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology, 2004

2003
ABHIDHA: An extended WordNet for Indo-Aryan Languages.
Proceedings of the Thirteenth International Work Shop on Research Issues in Data Engineering: Multi-lingual Information Management, 2003


  Loading...