Salim Roukos

Orcid: 0000-0003-2140-4349

According to our database1, Salim Roukos authored at least 137 papers between 1989 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 




Graph-based Uncertainty Metrics for Long-form Language Model Outputs.
CoRR, 2024

Retrieval Augmented Generation-Based Incident Resolution Recommendation System for IT Support.
CoRR, 2024

Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks.
CoRR, 2024

Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels.
CoRR, 2024

Can a Multichoice Dataset be Repurposed for Extractive Question Answering?
CoRR, 2024

CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems.
CoRR, 2024

Self-Refinement of Language Models from External Proxy Metrics Feedback.
CoRR, 2024

Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

CHRONOS: A Schema-Based Event Understanding and Prediction System.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs.
CoRR, 2023

Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency.
CoRR, 2023

Slide, Constrain, Parse, Repeat: Synchronous SlidingWindows for Document AMR Parsing.
CoRR, 2023

AMR Parsing with Instruction Fine-tuned Pre-trained Language Models.
CoRR, 2023

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development.
CoRR, 2023

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Ensemble-Instruct: Instruction Tuning Data Generation with a Heterogeneous Mixture of LMs.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

A Closer Look at the Calibration of Differentially Private Learners.
CoRR, 2022

A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases.
CoRR, 2022

DocAMR: Multi-Sentence AMR Representation and Evaluation.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Maximum Bayes Smatch Ensemble Distillation for AMR Parsing.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Zero-shot Entity Linking with Less Data.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

Logical Neural Networks for Knowledge Base Completion with Embeddings & Rules.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Learning to Transpile AMR into SPARQL.
CoRR, 2021

SYGMA: System for Generalizable Modular Question Answering OverKnowledge Bases.
CoRR, 2021

Combining Rules and Embeddings via Neuro-Symbolic AI for Knowledge Base Completion.
CoRR, 2021

Synthetic Target Domain Supervision for Open Retrieval QA.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Structure-aware Fine-tuning of Sequence-to-sequence Transformers for Transition-based AMR Parsing.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Bootstrapping Multilingual AMR with Contextual Word Alignments.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

A Semantics-aware Transformer Model of Relation Linking for Knowledge Base Question Answering.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

KAAPA: Knowledge Aware Answers from PDF Analysis.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

A Semantic Parsing and Reasoning-Based Approach to Knowledge Base Question Answering.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Question Answering over Knowledge Bases by Leveraging Semantic Parsing and Neuro-Symbolic Reasoning.
CoRR, 2020

End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training.
CoRR, 2020

Leveraging Semantic Parsing for Relation Linking over Knowledge Bases.
Proceedings of the Semantic Web - ISWC 2020, 2020

Multi-Stage Pre-training for Low-Resource Domain Adaptation.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Pushing the Limits of AMR Parsing with Self-Learning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

ARES: A Reading Comprehension Ensembling Service.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020

A Multilingual Reading Comprehension System for more than 100 Languages.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Towards building a Robust Industry-scale Question Answering System.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

GPT-too: A Language-Model-First Approach for AMR-to-Text Generation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Heuristics for Interpretable Knowledge Graph Contextualization.
CoRR, 2019

Ensembling Strategies for Answering Natural Questions.
CoRR, 2019

Frustratingly Easy Natural Question Answering.
CoRR, 2019

Links with Answers: Query Answering for Customer Support.
Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, 2019

CFO: A Framework for Building Production NLP Systems.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Rewarding Smatch: Transition-Based AMR Parsing with Reinforcement Learning.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Improving MT post-editing productivity with adaptive confidence estimation for document-specific translation model.
Mach. Transl., 2014

Multi-lingual Text Leveling.
Proceedings of the Text, Speech and Dialogue - 17th International Conference, 2014

Invited Talk: IBM Cognitive Computing - An NLP Renaissance!
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

Adaptive HTER Estimation for Document-Specific MT Post-Editing.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Distilling and exploring nuggets from a corpus.
Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, 2012

Document-Specific Statistical Machine Translation for Improving Human Translation Productivity.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2012

IBM Chinese-to-English PatentMT System for NTCIR-9.
Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, 2011

A Correction Model for Word Alignments.
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011

A novel approach for proper name transliteration verification.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Improving Mention Detection Robustness to Noisy Input.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010

Learning to Predict Readability using Diverse Linguistic Features.
Proceedings of the COLING 2010, 2010

Active Learning for Mention Detection: A Comparison of Sentence Selection Strategies
CoRR, 2009

Real Time Translation Services at IBM.
Proceedings of Machine Translation Summit XII: Commercial MT User Program, 2009

Iterative sentence-pair extraction from quasi-parallel corpora for machine translation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

System Combination for Machine Translation of Spoken and Written Language.
IEEE Trans. Speech Audio Process., 2008

Rethinking Full-Text Search for Multi-lingual Databases.
IEEE Data Eng. Bull., 2007

Direct Translation Model 2.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

Extracting Social Networks and Biographical Facts From Conversational Speech Transcripts.
Proceedings of the ACL 2007, 2007

Recent results on MT evaluation in the GALE program.
Proceedings of the 2006 International Workshop on Spoken Language Translation, 2006

A Maximum Entropy Word Aligner for Arabic-English Machine Translation.
Proceedings of the HLT/EMNLP 2005, 2005

A Statistical Model for Multilingual Entity Detection and Tracking.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2004

IBM spoken language translation system evaluation.
Proceedings of the 2004 International Workshop on Spoken Language Translation, 2004

A Mention-Synchronous Coreference Resolution Algorithm Based On the Bell Tree.
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004

Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002.
SIGIR Forum, 2003

Automatic Derivation of Surface Text Patterns for a Maximum Entropy Based Question Answering System.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003

dentifying and Tracking Entity Mentions in a Maximum Entropy Framework.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003

TIPS: A Translingual Information Processing System.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003

Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 2003

Language Model Based Arabic Word Segmentation.
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 2003

A multistage algorithm for spotting new words in speech.
IEEE Trans. Speech Audio Process., 2002

IBM's Statistical Question Answering System-TREC 11.
Proceedings of The Eleventh Text REtrieval Conference, 2002

A Flexible Framework for Developing Mixed-Initiative Dialog Systems.
Proceedings of the SIGDIAL 2002 Workshop, 2002

DARPA communicator: cross-system results for the 2001 evaluation.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

DARPA communicator evaluation: progress from 2000 to 2001.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Active Learning for Statistical Natural Language Parsing.
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002

Bleu: a Method for Automatic Evaluation of Machine Translation.
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002

IBM's Statistical Question Answering System - TREC-10.
Proceedings of The Tenth Text REtrieval Conference, 2001

Statistical Methods for Translingual Information Retrieval.
Proceedings of the 2000 Kyoto International Conference on Digital Libraries: Research and Practice, 2000

Real-time multilingual HMM training robust to channel variations.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Statistical methods for topic segmentation.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Free-flow dialog management using forms.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Use of recursive mumble models for confidence measuring.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Story segmentation and topic detection for recognized speech.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

The IBM conversational telephony system for financial applications.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Phrase splicing and variable substitution using the IBM trainable speech synthesis system.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

Ad hoc and Multilingual Information Retrieval at IBM.
Proceedings of The Seventh Text REtrieval Conference, 1998

Audio-Indexing For Broadcast News.
Proceedings of The Seventh Text REtrieval Conference, 1998

A Method for Scoring Correlated Features in Query Expansion.
Proceedings of the SIGIR '98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998

Probabilistic Modeling for Information Retrieval with Unsupervised Training Data.
Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), 1998

Speech Research: Near and Not-so-near Results and What They Might Mean for IUI (Panel).
Proceedings of the 3rd International Conference on Intelligent User Interfaces, 1998

Towards speech understanding across multiple languages.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Maximum likelihood and discriminative training of direct translation models.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

A fast vocabulary independent algorithm for spotting words in speech.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Fast Document Translation for Cross-Language Information Retrieval.
Proceedings of the Machine Translation and the Information Soup, 1998

TREC-6 Ad-Hoc Retrieval.
Proceedings of The Sixth Text REtrieval Conference, 1997

MDI adaptation of language models across corpora.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Feature-based language understanding.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Word-based confidence measures as a guide for stack search in speech recognition.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

Fertility Models for Statistical Natural Language Understanding.
Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, 1997

TREC-5 Ad Hoc Retrieval Using K Nearest-Neighbors Re-Scoring.
Proceedings of The Fifth Text REtrieval Conference, 1996

Statistical natural language understanding using hidden clumpings.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

An Iterative Algorithm to Build Chinese Language Models.
Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, 1996

A statistical approach to language modelling for the ATIS task.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

Language model adaptation via minimum discrimination information.
Proceedings of the 1995 International Conference on Acoustics, 1995

Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task.
Proceedings of the 1995 International Conference on Acoustics, 1995

A Maximum Entropy Model for Prepositional Phrase Attachment.
Proceedings of the Human Language Technology, 1994

Decision Tree Parsing using a Hidden Derivation Model.
Proceedings of the Human Language Technology, 1994

A maximum entropy model for parsing.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Automatic Extraction of Grammars From Annotated Text.
Proceedings of the Human Language Technology: Proceedings of a Workshop Held at Plainsboro, 1993

Adaptive Language Modeling Using The Maximum Entropy Principle.
Proceedings of the Human Language Technology: Proceedings of a Workshop Held at Plainsboro, 1993

Trigger-based language models: a maximum entropy approach.
Proceedings of the IEEE International Conference on Acoustics, 1993

Decision Tree Models Applied to the Labeling of Text with Parts-of-Speech.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, 1992

Towards History-based Grammars: Using Richer Models for Probabilistic Parsing.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, 1992

Adaptive language modeling using minimum discriminant estimation.
Proceedings of the 1992 IEEE International Conference on Acoustics, 1992

Adaptation of large vocabulary recognition system parameters.
Proceedings of the 1992 IEEE International Conference on Acoustics, 1992

Development and Evaluation of a Broad-Coverage Probabilistic Grammar of English-Language Computer Manuals.
Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, 28 June, 1992

Session 7: Natural Language II.
Proceedings of the Speech and Natural Language, 1991

A Dynamic Language Model for Speech Recognition.
Proceedings of the Speech and Natural Language, 1991

A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars.
Proceedings of the Speech and Natural Language, 1991

Classifying words for improved statistical language models.
Proceedings of the 1990 International Conference on Acoustics, 1990

A stochastic segment model for phoneme-based continuous speech recognition.
IEEE Trans. Acoust. Speech Signal Process., 1989

Integrating Speech and Natural Language.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Philadelphia, 1989

The BBN Spoken Language System.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Philadelphia, 1989

Continuous hidden Markov modeling for speaker-independent word spotting.
Proceedings of the IEEE International Conference on Acoustics, 1989

Speech understanding using a unification grammar.
Proceedings of the IEEE International Conference on Acoustics, 1989
