Sunita Sarawagi

  • Indian Institute of Technology, Bombay, India

According to our database1, Sunita Sarawagi authored at least 142 papers between 1994 and 2024.

Collaborative distances:


ACM Fellow

ACM Fellow 2021, "For contributions to statistical machine learning for information analysis, extraction, and integration".



In proceedings 
PhD thesis 


Online presence:



SALSA: Speedy ASR-LLM Synchronous Aggregation.
CoRR, 2024

PairNet: Training with Observed Pairs to Estimate Individual Treatment Effect.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Training of Language Models with Compact and Consistent Next Token Distributions.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Continuous Treatment Effect Estimation Using Gradient Interpolation and Kernel Smoothing.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Improving RNN-Transducers with Acoustic LookAhead.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Conditional Tree Matching for Inference-Time Adaptation of Tree Prediction Models.
Proceedings of the International Conference on Machine Learning, 2023

In-Situ Text-Only Adaptation of Speech Models with Low-Overhead Speech Imputations.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Modern AI for Analyzing Large Structured Databases: Opportunities and Challenges.
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023

Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

CRUSH4SQL: Collective Retrieval Using Schema Hallucination For Text2SQL.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Benchmarking and Improving Text-to-SQL Generation under Ambiguity.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Bootstrapping Multilingual Semantic Parsers using Large Language Models.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Structured Case-Based Reasoning for Inference-Time Adaptation of Text-to-SQL Parsers.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Bootstrapping Multilingual Semantic Parsers using Large Language Models.
CoRR, 2022

AI and data science centers in top Indian academic institutions.
Commun. ACM, 2022

Learning Recourse on Instance Environment to Enhance Prediction Accuracy.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Coherent Probabilistic Aggregate Queries on Long-horizon Forecasts.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Focus on the Common Good: Group Distributional Robustness Follows.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Adaptive Discounting of Implicit Language Models in RNN-Transducers.
Proceedings of the IEEE International Conference on Acoustics, 2022

Adapting Multilingual Models for Code-Mixed Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Quality Scoring of Source Words in Neural Translation Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Accurate Online Posterior Alignments for Principled Lexically-Constrained Decoding.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Deep Indexed Active Learning for Matching Heterogeneous Entity Representations.
Proc. VLDB Endow., 2021

Missing Value Imputation on Multidimensional Time Series.
Proc. VLDB Endow., 2021

Long Range Probabilistic Forecasting in Time-Series using High Order Statistics.
CoRR, 2021

Long Horizon Forecasting with Temporal Point Processes.
Proceedings of the WSDM '21, 2021

Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Training for the Future: A Simple Gradient Interpolation Loss to Generalize Along Time.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Training Data Augmentation for Code-Mixed Translation.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Low Resource ASR: The Surprising Effectiveness of High Resource Transliteration.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Error-Driven Fixed-Budget ASR Personalization for Accented Speakers.
Proceedings of the IEEE International Conference on Acoustics, 2021

Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

What's in a Name? Are BERT Named Entity Representations just as Good for any other Name?
Proceedings of the 5th Workshop on Representation Learning for NLP, 2020

Black-Box Adaptation of ASR for Accented Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Efficient Domain Generalization via Common-Specific Low-Rank Decomposition.
Proceedings of the 37th International Conference on Machine Learning, 2020

Learning from Rules Generalizing Labeled Exemplars.
Proceedings of the 8th International Conference on Learning Representations, 2020

NLP Service APIs and Models for Efficient Registration of New Clients.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Robust Data Programming with Precision-guided Labeling Functions.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Data Programming using Continuous and Quality-Guided Labeling Functions.
CoRR, 2019

Calibration of Encoder Decoder Models for Neural Machine Translation.
CoRR, 2019

Streaming Adaptation of Deep Forecasting Models using Adaptive Recurrent Units.
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019

Posterior Attention Models for Sequence to Sequence Learning.
Proceedings of the 7th International Conference on Learning Representations, 2019

Parallel Iterative Edit Models for Local Sequence Transduction.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Continual Learning with Neural Networks: A Review.
Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 2019

Topic Sensitive Attention on Generic Corpora Corrects Sense Bias in Pretrained Embeddings.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Column Segmentation.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

ARMDN: Associative and Recurrent Mixture Density Networks for eRetail Demand Forecasting.
CoRR, 2018

Trainable Calibration Measures For Neural Networks From Kernel Mean Embeddings.
Proceedings of the 35th International Conference on Machine Learning, 2018

Generalizing Across Domains via Cross-Gradient Training.
Proceedings of the 6th International Conference on Learning Representations, 2018

Surprisingly Easy Hard-Attention for Sequence to Sequence Learning.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Labeled Memory Networks for Online Model Adaptation.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Label Organized Memory Augmented Neural Network.
CoRR, 2017

Occurrence Statistics of Entities, Relations and Types on the Web.
CoRR, 2016

Discovering Structure in the Universe of Attribute Names.
Proceedings of the 25th International Conference on World Wide Web, 2016

Privacy-preserving Class Ratio Estimation.
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

Length bias in Encoder Decoder Models and a Case for Global Conditioning.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Numerical Relation Extraction with Minimal Supervision.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

Mining Subjective Properties on the Web.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

A few good predictions: selective node labeling in a social network.
Proceedings of the Seventh ACM International Conference on Web Search and Data Mining, 2014

Open-domain quantity queries on web tables: annotation, response, and consensus models.
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014

Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection.
Proceedings of the 31th International Conference on Machine Learning, 2014

Special issue on best papers of VLDB 2011.
VLDB J., 2013

Data-based research at IIT Bombay.
SIGMOD Rec., 2013

Answering Table Queries on the Web using Column Keywords.
Proc. VLDB Endow., 2012

Active Evaluation of Classifiers on Large Datasets.
Proceedings of the 12th IEEE International Conference on Data Mining, 2012

Letter from the VLDB 2011 Research Track Co-Chair.
Proc. VLDB Endow., 2011

Letter from the Research Track Co-Chair.
Proc. VLDB Endow., 2011

Joint training for open-domain extraction on the web: exploiting overlap when supervision is limited.
Proceedings of the Forth International Conference on Web Search and Web Data Mining, 2011

Annotating and Searching Web Tables Using Entities, Types and Relationships.
Proc. VLDB Endow., 2010

Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials.
J. Mach. Learn. Res., 2010

Enhancing Search with Structure.
IEEE Data Eng. Bull., 2010

Joint Structured Models for Extraction from Overlapping Sources
CoRR, 2010

MAP estimation in Binary MRFs via Bipartite Multi-cuts.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Column Segmentation.
Proceedings of the Encyclopedia of Database Systems, 2009

Answering Web Questions Using Structured Data - Dream or Reality?
Proc. VLDB Endow., 2009

Answering Table Augmentation Queries from Unstructured Lists on the Web.
Proc. VLDB Endow., 2009

Generalized Collective Inference with Symmetric Clique Potentials
CoRR, 2009

Efficient top-k count queries over imprecise duplicates.
Proceedings of the EDBT 2009, 2009

Querying for relations from the semi-structured Web.
Proceedings of the 15th International Conference on Management of Data, 2009

Queries over Unstructured Data: Probabilistic Methods to the Rescue - (Keynote).
Proceedings of the Enabling Real-Time Business Intelligence - Third International Workshop, 2009

Domain adaptation of information extraction models.
SIGMOD Rec., 2008

The Claremont report on database research.
SIGMOD Rec., 2008

Information Extraction.
Found. Trends Databases, 2008

Accurate max-margin training for structured output spaces.
Proceedings of the Machine Learning, 2008

Probabilistic Graphical Models and their Role in Databases.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

Domain Adaptation of Conditional Probability Models Via Feature Subsetting.
Proceedings of the Knowledge Discovery in Databases: PKDD 2007, 2007

Efficient inference with cardinality-based clique potentials.
Proceedings of the Machine Learning, 2007

Creating Probabilistic Databases from Information Extraction Models.
Proceedings of the 32nd International Conference on Very Large Data Bases, 2006

Record linkage: similarity measures and algorithms.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2006

Efficient inference on sequence segmentation models.
Proceedings of the Machine Learning, 2006

Integrating Unstructured Data into Relational Databases.
Proceedings of the 22nd International Conference on Data Engineering, 2006

Efficient Batch Top-k Search for Dictionary-based Entity Recognition.
Proceedings of the 22nd International Conference on Data Engineering, 2006

Scalable Information Extraction and Integration.
Proceedings of the 13th International Conference on Management of Data, 2006

Text Classification with Evolving Label-Sets.
Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 2005

Learning to extract information from large websites using sequential models.
Proceedings of the Advances in Data Management 2005, 2005

Extracting predicates from mining models for efficient query evaluation.
ACM Trans. Database Syst., 2004

Learning to extract information from large domain-specific websites using sequential models.
SIGKDD Explor., 2004

Efficient set joins on similarity predicates.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2004

HIClass: Hyper-interactive Text Classification by Interactive Supervision of Document and Term Labels.
Proceedings of the Knowledge Discovery in Databases: PKDD 2004, 2004

Document Classification Through Interactive Supervision of Document and Term Labels.
Proceedings of the Knowledge Discovery in Databases: PKDD 2004, 2004

Discriminative Methods for Multi-labeled Classification.
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2004

Semi-Markov Conditional Random Fields for Information Extraction.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Models and Indices for Integrating Unstructured Data with a Relational Database.
Proceedings of the KDID 2004, 2004

Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods.
Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004

Resolving citations in a paper repository.
SIGKDD Explor., 2003

Factorizing Complex Predicates in Queries to Exploit Indexes.
Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, 2003

Cross-training: learning probabilistic mappings between topics.
Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24, 2003

Scaling up the ALIAS Duplicate Elimination System.
Proceedings of the 19th International Conference on Data Engineering, 2003

Sequence Data Mining Techniques and Applications.
Proceedings of the 19th International Conference on Data Engineering, 2003

ALIAS: An Active Learning led Interactive Deduplication System.
Proceedings of 28th International Conference on Very Large Data Bases, 2002

Automation in Information Extraction and Data Integration.
Proceedings of 28th International Conference on Very Large Data Bases, 2002

Interactive deduplication using active learning.
Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002

Scaling multi-class support vector machines using inter-class confusion.
Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002

Efficient Evaluation of Queries with Mining Predicates.
Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, February 26, 2002

User-cognizant multidimensional analysis.
VLDB J., 2001

Letter from the Special Issue Editor.
IEEE Data Eng. Bull., 2001

iDiff: Informative Summarization of Differences in Multidimensional Aggregates.
Data Min. Knowl. Discov., 2001

Intelligent Rollups in Multidimensional OLAP Data.
Proceedings of the VLDB 2001, 2001

Automatic Segmentation of Text into Structured Records.
Proceedings of the 2001 ACM SIGMOD international conference on Management of data, 2001

Reminiscences on Influential Papers.
SIGMOD Rec., 2000

Data Mining Models as Services on the Internet.
SIGKDD Explor., 2000

SIGKDD Explor., 2000

SIGKDD Explor., 2000

Automatically Extracting Structure from Free Text Addresses.
IEEE Data Eng. Bull., 2000

Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications.
Data Min. Knowl. Discov., 2000

User-Adaptive Exploration of Multidimensional Data.
Proceedings of the VLDB 2000, 2000

i<sup>3</sup>: Intelligent, Interactive Investigaton of OLAP data cubes.
Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 2000

Explaining Differences in Multidimensional Aggregates.
Proceedings of the VLDB'99, 1999

Mining Surprising Patterns Using Temporal Description Length.
Proceedings of the VLDB'98, 1998

Integrating Mining with Relational Database Systems: Alternatives and Implications.
Proceedings of the SIGMOD 1998, 1998

Mining Generalized Association Rules and Sequential Patterns Using SQL Queries.
Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), 1998

Discovery-Driven Exploration of OLAP Data Cubes.
Proceedings of the Advances in Database Technology, 1998

Execution Reordering for Tertiary Memory Access.
IEEE Data Eng. Bull., 1997

Indexing OLAP Data.
IEEE Data Eng. Bull., 1997

Modeling Multidimensional Databases.
Proceedings of the Thirteenth International Conference on Data Engineering, 1997

Reordering Query Execution in Tertiary Memory Databases.
Proceedings of the VLDB'96, 1996

On the Computation of Multidimensional Aggregates.
Proceedings of the VLDB'96, 1996

Query Processing in Tertiary Memory Databases.
Proceedings of the VLDB'95, 1995

Database Systems for Efficient Access to Tertiary Memory.
Proceedings of the Fourteenth IEEE Symposium on Mass Storage Systems, 1995

Efficient Organization of Large Multidimensional Arrays.
Proceedings of the Tenth International Conference on Data Engineering, 1994
