Martin Potthast

Orcid: 0000-0003-2451-0665

Affiliations:
  • University of Kassel, Kassel, Germany
  • hessian.AI, Darmstadt, Germany
  • ScaDS.AI, Dresden and Leipzig, Germany
  • Leipzig University, Leipzig, Germany (former)
  • Bauhaus University, Weimar, Germany (former)


According to our database1, Martin Potthast authored at least 393 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Report on the 1st International Workshop on Open Web Search (WOWS 2024) at ECIR 2024.
SIGIR Forum, June, 2024

Impact and development of an Open Web Index for open web search.
J. Assoc. Inf. Sci. Technol., May, 2024

Task-Oriented Paraphrase Analytics.
Dataset, May, 2024

Supplementary run files for the paper "Learning Effective Representations for Retrieval using Self-Distillation with Adaptive Relevance Margins".
Dataset, May, 2024

Manipulating Embeddings of Stable Diffusion Prompts.
Dataset, May, 2024

Who Determines What Is Relevant? Humans or AI? Why Not Both?
Commun. ACM, April, 2024

Touché24-Image-Retrieval-and-Generation-for-Arguments.
Dataset, April, 2024

webis-de/WWW-24: Release 0.1.0.
Dataset, March, 2024

Webis Generated Native Ads 2024.
Dataset, March, 2024

PAN24 Multi-Author Writing Style Analysis.
Dataset, February, 2024

Touché24-Image-Retrieval-and-Generation-for-Arguments.
Dataset, February, 2024

Wikipedia CRISPR Innovation Tracing Data 2023.
Dataset, January, 2024

Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions.
CoRR, 2024

Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins.
CoRR, 2024

A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking.
CoRR, 2024

If there's a Trigger Warning, then where's the Trigger? Investigating Trigger Warnings at the Passage Level.
CoRR, 2024

Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders.
CoRR, 2024

Detecting Generated Native Ads in Conversational Search.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024

A Mastodon Corpus to Evaluate Federated Microblog Search.
Proceedings of the first International Workshop on Open Web Search co-located with the 46th European Conference on Information Retrieval ECIR 2024, 2024

Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Evaluating Generative Ad Hoc Information Retrieval.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Resources for Combining Teaching and Research in Information Retrieval Coursework.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

ReNeuIR at SIGIR 2024: The Third Workshop on Reaching Efficiency in Neural Information Retrieval.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Objective Argument Summarization in Search.
Proceedings of the Robust Argumentation Machines - First International Conference, 2024

Classification of Shared Tasks Used in Teaching.
Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1, 2024

The Information Retrieval Experiment Platform (Extended Abstract).
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Manipulating Embeddings of Stable Diffusion Prompts.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Revisiting Query Variation Robustness of Transformer Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Zero-Shot Generative Large Language Models for Systematic Review Screening Automation.
Proceedings of the Advances in Information Retrieval, 2024

Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models.
Proceedings of the Advances in Information Retrieval, 2024



The Open Web Index - Crawling and Indexing the Web for Public Use.
Proceedings of the Advances in Information Retrieval, 2024

The First International Workshop on Open Web Search (WOWS).
Proceedings of the Advances in Information Retrieval, 2024

Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search Engines.
Proceedings of the Advances in Information Retrieval, 2024

Overview of PAN 2024: Multi-author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification - Extended Abstract.
Proceedings of the Advances in Information Retrieval, 2024

TL;DR Progress: Multi-faceted Literature Exploration in Text Summarization.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

Task-Oriented Paraphrase Analytics.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Overview of the Multi-Author Writing Style Analysis Task at PAN 2024.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), 2024

De-noising Document Classification Benchmarks via Prompt-Based Rank Pruning: A Case Study.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2024


Team OpenWebSearch at CLEF 2024: QuantumCLEF.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), 2024

Overview of the "Voight-Kampff" Generative AI Authorship Verification Task at PAN and ELOQUENT 2024.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), 2024

Overview of PAN 2024: Multi-author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification Condensed Lab Overview.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2024

Team OpenWebSearch at CLEF 2024: LongEval.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), 2024

A User Study on the Acceptance of Native Advertising in Generative IR.
Proceedings of the 2024 ACM SIGIR Conference on Human Information Interaction and Retrieval, 2024

Product Spam on YouTube: A Case Study.
Proceedings of the 2024 ACM SIGIR Conference on Human Information Interaction and Retrieval, 2024

2023
Small-Text: Active Learning for Text Classification in Python.
Dataset, December, 2023

EMNLP-23-Bootstrapping-a-Violence-Detector-for-Fan-Fiction.
Dataset, October, 2023

Webis-Context-SciSumm-2023.
Dataset, October, 2023

Task-Oriented Paraphrase Analytics.
Dataset, October, 2023

Touché23-Image-Retrieval-for-Arguments.
Dataset, September, 2023

Small-Text: Active Learning for Text Classification in Python.
Dataset, August, 2023


Manipulating Embeddings of Stable Diffusion Prompts.
Dataset, August, 2023

ChatNoir Resiliparse.
Dataset, August, 2023

Small-Text: Active Learning for Text Classification in Python.
Dataset, July, 2023

Report on the Dagstuhl Seminar on Frontiers of Information Access Experimentation for Research and Education.
SIGIR Forum, June, 2023

A diachronic perspective on citation latency in Wikipedia articles on CRISPR/Cas-9: an exploratory case study.
Scientometrics, June, 2023

Webis Wikipedia Innovation History 2023.
Dataset, June, 2023

Small-Text: Active Learning for Text Classification in Python.
Dataset, February, 2023

Small-Text: Active Learning for Text Classification in Python.
Dataset, February, 2023

Touché23-Image-Retrieval-for-Arguments.
Dataset, February, 2023

Webis Wikipedia-IPC.
Dataset, February, 2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

Commercialized Generative AI: A Critical Study of the Feasibility and Ethics of Generating Native Advertising Using Large Language Models in Conversational Web Search.
CoRR, 2023

Using Language Models on Low-end Hardware.
CoRR, 2023

Smooth Operators for Effective Systematic Review Queries.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

pybool_ir: A Toolkit for Domain-Specific Search Experiments.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

The Information Retrieval Experiment Platform.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

On Stance Detection in Image Retrieval for Argumentation.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation.
Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, 2023

Frame-oriented Summarization of Argumentative Discussions.
Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue, 2023

A New Dataset for Causality Identification in Argumentative Texts.
Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue, 2023

OpinionConv: Conversational Product Search with Grounded Opinions.
Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue, 2023

SemEval-2023 Task 5: Clickbait Spoiling.
Proceedings of the The 17th International Workshop on Semantic Evaluation, 2023

The Information Retrieval Experiment Platform.
Proceedings of the Lernen, 2023

Mining the History Sections of Wikipedia Articles on Science and Technology.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2023

SMAuC - The Scientific Multi-Authorship Corpus.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2023

Perspectives on Large Language Models for Relevance Judgment.
Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, 2023

Trigger Warnings: Bootstrapping a Violence Detector for Fan Fiction.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Indicative Summarization of Long Discussions.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Citance-Contextualized Summarization of Scientific Papers.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Dynamic Exploratory Search for the Information Retrieval Anthology.
Proceedings of the Advances in Information Retrieval, 2023

Continuous Integration for Reproducible Shared Tasks with TIRA.io.
Proceedings of the Advances in Information Retrieval, 2023

Bootstrapped nDCG Estimation in the Presence of Unjudged Documents.
Proceedings of the Advances in Information Retrieval, 2023

Overview of Touché 2023: Argument and Causal Retrieval - Extended Abstract.
Proceedings of the Advances in Information Retrieval, 2023

Overview of PAN 2023: Authorship Verification, Multi-author Writing Style Analysis, Profiling Cryptocurrency Influencers, and Trigger Detection - Extended Abstract.
Proceedings of the Advances in Information Retrieval, 2023

Small-Text: Active Learning for Text Classification in Python.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. EACL 2023, 2023

Paraphrase Acquisition from Image Captions.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Topic Ontologies for Arguments.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Overview of the Multi-Author Writing Style Analysis Task at PAN 2023.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), 2023

Overview of the Trigger Detection Task at PAN 2023.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), 2023

Overview of the Authorship Verification Task at PAN 2023.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), 2023

Open Web Search at LongEval 2023: Reciprocal Rank Fusion on Automatically Generated Query Variants.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), 2023

Overview of PAN 2023: Authorship Verification, Multi-Author Writing Style Analysis, Profiling Cryptocurrency Influencers, and Trigger Detection - Condensed Lab Overview.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2023

Overview of Touché 2023: Argument and Causal Retrieval.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), 2023

The Infinite Index: Information Retrieval on Generative Text-To-Image Models.
Proceedings of the 2023 Conference on Human Information Interaction and Retrieval, 2023

Exploring Hyperparameter Usage and Tuning in Machine Learning Research.
Proceedings of the 2nd IEEE/ACM International Conference on AI Engineering, 2023

Modeling Appropriate Language in Argumentation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Trigger Warning Assignment as a Multi-Label Document Classification Problem.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

Shared Tasks as Tutorials: A Methodical Approach.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Report on the 13th Conference and Labs of the Evaluation Forum (CLEF 2022): Experimental IR Meets Multilinguality, Multimodality, and Interaction.
SIGIR Forum, December, 2022

Touché23-Image-Retrieval-for-Arguments.
Dataset, November, 2022

Small-Text: Active Learning for Text Classification in Python.
Dataset, October, 2022

Small-Text: Active Learning for Text Classification in Python.
Dataset, October, 2022



Small-Text: Active Learning for Text Classification in Python.
Dataset, September, 2022

Webis Health CauseNet 2022.
Dataset, September, 2022


Small-Text: Active Learning for Text Classification in Python.
Dataset, June, 2022

Touché22-Image-Retrieval-for-Arguments.
Dataset, June, 2022

Touché22-Image-Retrieval-for-Arguments.
Dataset, June, 2022

Touché22-Image-Retrieval-for-Arguments.
Dataset, June, 2022

PAN22 Authorship Analysis: Style Change Detection.
Dataset, March, 2022

Webis Clickbait Spoiling Corpus 2022.
Dataset, March, 2022

Webis Clickbait Spoiling Corpus 2022.
Dataset, March, 2022

Webis-MS-MARCO-Anchor-Texts-22.
Dataset, January, 2022

WARC-DL: Scalable Web Archive Processing for Deep Learning.
CoRR, 2022

Trigger Warnings: Bootstrapping a Violence Detector for FanFiction.
CoRR, 2022

Tracking Discourse Influence in Darknet Forums.
CoRR, 2022

Webis at TREC 2022: Deep Learning and Health Misinformation.
Proceedings of the Thirty-First Text REtrieval Conference, 2022

How Train-Test Leakage Affects Zero-Shot Retrieval.
Proceedings of the String Processing and Information Retrieval, 2022

Differential Bias: On the Perceptibility of Stance Imbalance in Argumentation.
Proceedings of the Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, 2022

Sparse Pairwise Re-ranking with Pre-trained Transformers.
Proceedings of the ICTIR '22: The 2022 ACM SIGIR International Conference on the Theory of Information Retrieval, Madrid, Spain, July 11, 2022

Visual Web Archive Quality Assessment.
Proceedings of the Linking Theory and Practice of Digital Libraries, 2022

SUMMARY WORKBENCH: Unifying Application and Evaluation of Text Summarization Models.
Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing, 2022

The Power of Anchor Text in the Neural Retrieval Era.
Proceedings of the Advances in Information Retrieval, 2022

Overview of Touché 2022: Argument Retrieval - Extended Abstract.
Proceedings of the Advances in Information Retrieval, 2022

Overview of PAN 2022: Authorship Verification, Profiling Irony and Stereotype Spreaders, Style Change Detection, and Trigger Detection - Extended Abstract.
Proceedings of the Advances in Information Retrieval, 2022

Mining Health-related Cause-Effect Statements with High Precision at Large Scale.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

CausalQA: A Benchmark for Causal Question Answering.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

Overview of the Style Change Detection Task at PAN 2022.
Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th - to, 2022

Overview of the Authorship Verification Task at PAN 2022.
Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th - to, 2022

Noise-Reduction for Automatically Transferred Relevance Judgments.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2022

Overview of Touché 2022: Argument Retrieval.
Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th - to, 2022

Overview of PAN 2022: Authorship Verification, Profiling Irony and Stereotype Spreaders, and Style Change Detection.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2022

Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Clickbait Spoiling via Question Answering and Passage Retrieval.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Data for PAN at SemEval 2019 Task 4: Hyperpartisan News Detection.
Dataset, December, 2021

Touché22-Argument-Retrieval-for-Controversial-Questions.
Dataset, November, 2021

Touché22-Argument-Retrieval-for-Controversial-Questions.
Dataset, November, 2021

Touché21-Argument-Retrieval-for-Controversial-Questions.
Dataset, November, 2021

Touché21-Argument-Retrieval-for-Controversial-Questions.
Dataset, November, 2021


Same Side Stance Classification Resampled Datasets.
Dataset, September, 2021

Same Sentiment Classification Train/Dev/Test Pair IDs.
Dataset, September, 2021

Same Side Stance Classification Adversarial Test Cases.
Dataset, September, 2021

Webis-ArgImages-21.
Dataset, August, 2021

Webis-ArgImages-21.
Dataset, August, 2021


Webis-Dataset-Reviews-21.
Dataset, February, 2021

Webis-WebSeg-20-Algorithm-Segmentations.
Dataset, January, 2021

Meta-Information in Conversational Search.
ACM Trans. Inf. Syst., 2021

The information retrieval anthology 2021: inaugural status report and challenges ahead.
SIGIR Forum, 2021

Predicting essay quality from search and writing behavior.
J. Assoc. Inf. Sci. Technol., 2021

STEREO: Scientific Text Reuse in Open Access Publications.
CoRR, 2021

FastWARC: Optimizing Large-Scale Web Archive Analytics.
CoRR, 2021

The Impact of Main Content Extraction on Near-Duplicate Detection.
CoRR, 2021

BERTian Poetics: Constrained Composition with Masked LMs.
CoRR, 2021

Modeling Proficiency with Implicit User Representations.
CoRR, 2021

Uncertainty-based Query Strategies for Active Learning with Transformers.
CoRR, 2021

Argument Undermining: Counter-Argument Generation by Attacking Weak Premises.
CoRR, 2021

Webis at TREC 2021: Deep Learning, Health Misinformation, and Podcasts Tracks.
Proceedings of the Thirtieth Text REtrieval Conference, 2021

The Information Retrieval Anthology.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

CopyCat: Near-Duplicates Within and Between the ClueWeb and the Common Crawl.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Identifying Queries in Instant Search Logs.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Summary Explorer: Visualizing the State of the Art in Text Summarization.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2021

On Classifying whether Two Texts are on the Same Side of an Argument.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Casting the Same Sentiment Classification Problem.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

An Empirical Comparison of Web Page Segmentation Algorithms.
Proceedings of the Advances in Information Retrieval, 2021

Overview of Touché 2021: Argument Retrieval - Extended Abstract.
Proceedings of the Advances in Information Retrieval, 2021

Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection - Extended Abstract.
Proceedings of the Advances in Information Retrieval, 2021

Overview of the Style Change Detection Task at PAN 2021.
Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to, 2021

Overview of the Cross-Domain Authorship Verification Task at PAN 2021.
Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to, 2021

Overview of Touché 2021: Argument Retrieval.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2021

Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2021

Learning to Rank Arguments with Feature Selection.
Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to, 2021

Image Retrieval for Arguments Using Stance-Aware Query Expansion.
Proceedings of the 8th Workshop on Argument Mining, 2021

Key Point Analysis via Contrastive Learning and Extractive Argument Summarization.
Proceedings of the 8th Workshop on Argument Mining, 2021

Generating Informative Conclusions for Argumentative Texts.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Beyond Metadata: What Paper Authors Say About Corpora They Use.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Counter-Argument Generation by Attacking Weak Premises.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
Webis SCSmeta 2021.
Dataset, October, 2020

Webis-WebSeg-20-Algorithm-Segmentations.
Dataset, October, 2020

CauseNet: Towards a Causality Graph Extracted from the Web.
Dataset, October, 2020


Touché20-Argument-Retrieval-for-Controversial-Questions.
Dataset, September, 2020

Touché20-Argument-Retrieval-for-Controversial-Questions.
Dataset, September, 2020

Touché20-Argument-Retrieval-for-Controversial-Questions.
Dataset, September, 2020



Webis Argument Quality Corpus 2020 (Webis-ArgQuality-20).
Dataset, May, 2020


Disaster Tweet Corpus 2020.
Dataset, March, 2020

PAN20 Authorship Analysis: Celebrity Profiling.
Dataset, February, 2020

Webis Abstractive Snippet Corpus 2020.
Dataset, February, 2020

The dilemma of the direct answer.
SIGIR Forum, 2020

On divergence-based author obfuscation: An attack on the state of the art in statistical authorship verification.
it Inf. Technol., 2020

The Importance of Suppressing Domain Style in Authorship Analysis.
CoRR, 2020

Common Conversational Community Prototype: Scholarly Conversational Assistant.
CoRR, 2020

Abstractive Snippet Generation.
Proceedings of the WWW '20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, 2020

Sampling Bias Due to Near-Duplicates in Learning to Rank.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

Towards Predicting the Subscription Status of Twitch.tv Users - ECML-PKDD ChAT Discovery Challenge 2020.
Proceedings of ECML-PKDD 2020 ChAT Discovery Challenge on Chat Analytics for Twitch co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2020 (ECML-PKDD 2020), 2020

Analysis of Detection Models for Disaster-Related Tweets.
Proceedings of the 17th International Conference on Information Systems for Crisis Response and Management, 2020

Task Proposal: Abstractive Snippet Generation for Web Pages.
Proceedings of the 13th International Conference on Natural Language Generation, 2020

Web Archive Analytics.
Proceedings of the 50. Jahrestagung der Gesellschaft für Informatik, INFORMATIK 2020 - Back to the Future, Karlsruhe, Germany, 28. September, 2020

A Search Engine for Police Press Releases to Double-Check the News.
Proceedings of the Advances in Information Retrieval, 2020

The Effect of Content-Equivalent Near-Duplicates on the Evaluation of Search Engines.
Proceedings of the Advances in Information Retrieval, 2020

Touché: First Shared Task on Argument Retrieval.
Proceedings of the Advances in Information Retrieval, 2020


News Editorials: Towards Summarizing Long Argumentative Texts.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Overview of the Style Change Detection Task at PAN 2020.
Proceedings of the Working Notes of CLEF 2020, 2020

Overview of the Celebrity Profiling Task at PAN 2020.
Proceedings of the Working Notes of CLEF 2020, 2020

Overview of the Cross-Domain Authorship Verification Task at PAN 2020.
Proceedings of the Working Notes of CLEF 2020, 2020

Overview of Touché 2020: Argument Retrieval.
Proceedings of the Working Notes of CLEF 2020, 2020

Overview of Touché 2020: Argument Retrieval - Extended Abstract.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2020

Overview of PAN 2020: Authorship Verification, Celebrity Profiling, Profiling Fake News Spreaders on Twitter, and Style Change Detection.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2020

Exploring Argument Retrieval with Transformers.
Proceedings of the Working Notes of CLEF 2020, 2020

Web Page Segmentation Revisited: Evaluation Framework and Dataset.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

CauseNet: Towards a Causality Graph Extracted from the Web.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

The Impact of Negative Relevance Judgments on NDCG.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

Estimating Topic Difficulty Using Normalized Discounted Cumulated Gain.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

Efficient Pairwise Annotation of Argument Quality.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Crawling and Preprocessing Mailing Lists At Scale for Dialog Analysis.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Target Inference in Argument Conclusion Generation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
PAN19 Authorship Analysis: Cross-Domain Authorship Attribution.
Dataset, November, 2019


Webis-Web-Errors-19.
Dataset, April, 2019

Webis-Web-Archive-17 Content Error Annotations.
Dataset, March, 2019

PAN19 Authorship Analysis: Celebrity Profiling.
Dataset, January, 2019

PAN19 Authorship Analysis: Celebrity Profiling.
Dataset, January, 2019

Webis-Web-Archive-17 Content Error Annotations.
Dataset, January, 2019

Modeling the usefulness of search results as measured by information use.
Inf. Process. Manag., 2019

Debiasing Vandalism Detection Models at Wikidata.
Proceedings of the World Wide Web Conference, 2019

Argument Search: Assessing Argument Relevance.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

SemEval-2019 Task 4: Hyperpartisan News Detection.
Proceedings of the 13th International Workshop on Semantic Evaluation, 2019

Generalizing Unmasking for Short Texts.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Summarizing E-sports matches and tournaments: the example of counter-strike: global offensive.
Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems, 2019

GameStory Task at MediaEval 2019.
Proceedings of the Working Notes Proceedings of the MediaEval 2019 Workshop, 2019

Data Acquisition for Argument Search: The args.me Corpus.
Proceedings of the KI 2019: Advances in Artificial Intelligence, 2019

A Dataset for Content Error Detection in Web Archives.
Proceedings of the 19th ACM/IEEE Joint Conference on Digital Libraries, 2019

Towards Summarization for Social Media - Results of the TL;DR Challenge.
Proceedings of the 12th International Conference on Natural Language Generation, 2019

A Decade of Shared Tasks in Digital Text Forensics at PAN.
Proceedings of the Advances in Information Retrieval, 2019

Wikipedia Text Reuse: Within and Without.
Proceedings of the Advances in Information Retrieval, 2019

Overview of the Style Change Detection Task at PAN 2019.
Proceedings of the Working Notes of CLEF 2019, 2019

Overview of the Celebrity Profiling Task at PAN 2019.
Proceedings of the Working Notes of CLEF 2019, 2019

Overview of the Cross-domain Authorship Attribution Task at PAN 2019.
Proceedings of the Working Notes of CLEF 2019, 2019

Overview of PAN 2019: Bots and Gender Profiling, Celebrity Profiling, Cross-Domain Authorship Attribution and Style Change Detection.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2019

Same Side Stance Classification Using Contextualized Sentence Embeddings.
Proceedings of the Same Side Stance Classification Shared Task organized as a part of the 6th Workshop on Argument Mining (ArgMining 2019) and co-located with the the 57th Annual Meeting of the Association for Computational Linguistics (ACL19), 2019

Celebrity Profiling.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Heuristic Authorship Obfuscation.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Bias Analysis and Mitigation in the Evaluation of Authorship Verification.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Evolution of the PAN Lab on Digital Text Forensics.
Proceedings of the Information Retrieval Evaluation in a Changing World, 2019

TIRA Integrated Research Architecture.
Proceedings of the Information Retrieval Evaluation in a Changing World, 2019

2018
Data for PAN at SemEval 2019 Task 4: Hyperpartisan News Detection.
Dataset, November, 2018

PAN18 Multi-Author Analysis: Style-Change-Detection.
Dataset, September, 2018

PAN18 Author Identification: Attribution.
Dataset, September, 2018

Webis YouTube 8M Augmented 2018.
Dataset, July, 2018

Webis Wikipedia Text Reuse Corpus 2018 (Webis-Wikipedia-Text-Reuse-18).
Dataset, July, 2018

Webis Wikipedia Text Reuse Corpus 2018 (Webis-Wikipedia-Text-Reuse-18).
Dataset, July, 2018



BuzzFeed-Webis Fake News Corpus 2016.
Dataset, February, 2018

Reproducible Web Corpora: Interactive Archiving with Automatic Quality Assessment.
ACM J. Data Inf. Qual., 2018

Evaluation-as-a-Service for the Computational Sciences: Overview and Outlook.
ACM J. Data Inf. Qual., 2018

The Clickbait Challenge 2017: Towards a Regression Model for Clickbait Strength.
CoRR, 2018

Heuristic Feature Selection for Clickbait Detection.
CoRR, 2018

A User Study on Snippet Generation: Text Reuse vs. Paraphrases.
Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018

Team ORG @ GameStory Task 2018.
Proceedings of the Working Notes Proceedings of the MediaEval 2018 Workshop, 2018

GameStory Task at MediaEval 2018.
Proceedings of the Working Notes Proceedings of the MediaEval 2018 Workshop, 2018

Task Proposal: The TL;DR Challenge.
Proceedings of the 11th International Conference on Natural Language Generation, 2018

Towards Crowdsourcing Clickbait Labels for YouTube Videos.
Proceedings of the HCOMP 2018 Works in Progress and Demonstration Papers Track of the sixth AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2018), 2018

Predicting Retrieval Success Based on Information Use for Writing Tasks.
Proceedings of the Digital Libraries for Open Knowledge, 2018

A Plan for Ancillary Copyright: Original Snippets.
Proceedings of the Second International Workshop on Recent Trends in News Information Retrieval co-located with 40th European Conference on Information Retrieval (ECIR 2018), 2018

Shaping the Information Nutrition Label.
Proceedings of the Second International Workshop on Recent Trends in News Information Retrieval co-located with 40th European Conference on Information Retrieval (ECIR 2018), 2018

Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl.
Proceedings of the Advances in Information Retrieval, 2018

WASP: Web Archiving and Search Personalized.
Proceedings of the First Biennial Conference on Design of Experimental Search & Information Retrieval Systems, 2018

CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies.
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Brussels, Belgium, October 31, 2018

Crowdsourcing a Large Corpus of Clickbait on Twitter.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

Overview of PAN 2018 - Author Identification, Author Profiling, and Author Obfuscation.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2018

Overview of the Author Obfuscation Task at PAN 2018: A New Approach to Measuring Safety.
Proceedings of the Working Notes of CLEF 2018, 2018

Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter.
Proceedings of the Working Notes of CLEF 2018, 2018

Overview of the Author Identification Task at PAN-2018: Cross-domain Authorship Attribution and Style Change Detection.
Proceedings of the Working Notes of CLEF 2018, 2018

A Stylometric Inquiry into Hyperpartisan and Fake News.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Webis-Web-Archive-17.
Dataset, October, 2017

Webis-Web-Archive-17.
Dataset, October, 2017

Webis-Web-Archive-17.
Dataset, October, 2017

PAN17 Author Identification: Clustering.
Dataset, September, 2017

Webis Query Spelling Corpus 2017 (Webis-QSpell-17).
Dataset, August, 2017

Webis Query Spelling Corpus 2017 (Webis-QSpell-17).
Dataset, August, 2017

Passphone: Outsourcing Phone-based Web Authentication while Protecting User Privacy.
IACR Cryptol. ePrint Arch., 2017

Proceedings of the WSDM Cup 2017: Vandalism Detection and Triple Scoring.
CoRR, 2017

Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017.
CoRR, 2017

WSDM Cup 2017: Vandalism Detection and Triple Scoring.
Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017

A Large-Scale Query Spelling Correction Corpus.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Spatio-Temporal Analysis of Reverted Wikipedia Edits.
Proceedings of the Eleventh International Conference on Web and Social Media, 2017

TL;DR: Mining Reddit to Learn Automatic Summarization.
Proceedings of the Workshop on New Frontiers in Summarization, 2017


Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering.
Proceedings of the Working Notes of CLEF 2017, 2017

Overview of PAN'17 - Author Identification, Author Profiling, and Author Obfuscation.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2017

Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter.
Proceedings of the Working Notes of CLEF 2017, 2017

Overview of the Author Obfuscation Task at PAN 2017: Safety Evaluation Revisited.
Proceedings of the Working Notes of CLEF 2017, 2017

Source Retrieval for Web-Scale Text Reuse Detection.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

Building an Argument Search Engine for the Web.
Proceedings of the 4th Workshop on Argument Mining, 2017

2016
Wikidata Vandalism Corpus 2016 (WDVC-16).
Dataset, September, 2016


Webis Clickbait Corpus 2016 (Webis-Clickbait-16).
Dataset, March, 2016

On Textual Analysis and Machine Learning for Cyberstalking Detection.
Datenbank-Spektrum, 2016

Visualizing Article Similarities in Wikipedia.
Proceedings of the 18th Eurographics Conference on Visualization, 2016

Algorithms and Corpora for Persian Plagiarism Detection - Overview of PAN at FIRE 2016.
Proceedings of the Text Processing, 2016

Clickbait Detection.
Proceedings of the Advances in Information Retrieval, 2016

Who Wrote the Web? Revisiting Influential Author Identification Research Applicable to Information Retrieval.
Proceedings of the Advances in Information Retrieval, 2016

Clustering by Authorship Within and Across Documents.
Proceedings of the Working Notes of CLEF 2016, 2016

Overview of PAN'16 - New Challenges for Authorship Analysis: Cross-Genre Profiling, Clustering, Diarization, and Obfuscation.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2016

Author Obfuscation: Attacking the State of the Art in Authorship Verification.
Proceedings of the Working Notes of CLEF 2016, 2016

Overview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations.
Proceedings of the Working Notes of CLEF 2016, 2016

Vandalism Detection in Wikidata.
Proceedings of the 25th ACM International Conference on Information and Knowledge Management, 2016

How Writers Search: Analyzing the Search and Writing Logs of Non-fictional Essays.
Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval, 2016

2015
PAN15 Author Identification: Verification.
Dataset, September, 2015

Wikidata Vandalism Corpus 2015 (WDVC-15).
Dataset, August, 2015

Report on the Evaluation-as-a-Service (EaaS) Expert Workshop.
SIGIR Forum, 2015

Evaluation-as-a-Service: Overview and Outlook.
CoRR, 2015

Visual Assessment of Alleged Plagiarism Cases.
Comput. Graph. Forum, 2015

Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis.
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015

Webis: An Ensemble for Twitter Sentiment Detection.
Proceedings of the 9th International Workshop on Semantic Evaluation, 2015

Twitter Sentiment Detection via Ensemble Classification Using Averaged Confidence Scores.
Proceedings of the Advances in Information Retrieval, 2015

Overview of the PAN/CLEF 2015 Evaluation Lab.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2015

Overview of the Author Identification Task at PAN 2015.
Proceedings of the Working Notes of CLEF 2015, 2015

Towards Data Submissions for Shared Tasks: First Experiences for the Task of Text Alignment.
Proceedings of the Working Notes of CLEF 2015, 2015

Overview of the 3rd Author Profiling Task at PAN 2015.
Proceedings of the Working Notes of CLEF 2015, 2015

Source Retrieval for Plagiarism Detection from Large Web Corpora: Recent Approaches.
Proceedings of the Working Notes of CLEF 2015, 2015

2014

PAN14 Originality: Text Alignment.
Dataset, September, 2014

Improving Cloze Test Performance of Language Learners Using Web N-Grams.
Proceedings of the COLING 2014, 2014

Overview of the Author Identification Task at PAN 2014.
Proceedings of the Working Notes for CLEF 2014 Conference, 2014

Overview of the 6th International Competition on Plagiarism Detection.
Proceedings of the Working Notes for CLEF 2014 Conference, 2014

Improving the Reproducibility of PAN's Shared Tasks: - Plagiarism Detection, Author Identification, and Author Profiling.
Proceedings of the Information Access Evaluation. Multilinguality, Multimodality, and Interaction, 2014

Overview of the Author Profiling Task at PAN 2014.
Proceedings of the Working Notes for CLEF 2014 Conference, 2014

2013

Webis Crowd Paraphrase Corpus 2011 (Webis-CPC-11).
Dataset, June, 2013

Paraphrase acquisition via crowdsourcing and machine learning.
ACM Trans. Intell. Syst. Technol., 2013

Exploratory Search Missions for TREC Topics.
Proceedings of the 3rd European Workshop on Human-Computer Interaction and Information Retrieval, 2013

Overview of the 5th International Competition on Plagiarism Detection.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

Recent Trends in Digital Text Forensics and Its Evaluation - Plagiarism Detection, Author Identification, and Author Profiling.
Proceedings of the Information Access Evaluation. Multilinguality, Multimodality, and Visualization, 2013

Crowdsourcing Interaction Logs to Understand Text Reuse from the Web.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012
Webis Text Reuse Corpus 2012.
Dataset, September, 2012

Technologies for Reusing Text from the Web
PhD thesis, 2012

WORDGRAPH: Keyword-in-Context Visualization for NETSPEAK's Wildcard Search.
IEEE Trans. Vis. Comput. Graph., 2012

Information Retrieval in the Commentsphere.
ACM Trans. Intell. Syst. Technol., 2012

Webis at the TREC 2012 Session Track.
Proceedings of The Twenty-First Text REtrieval Conference, 2012

ChatNoir: a search engine for the ClueWeb09 corpus.
Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, 2012

Overview of the 4th International Competition on Plagiarism Detection.
Proceedings of the CLEF 2012 Evaluation Labs and Workshop, 2012

Towards optimum query segmentation: in doubt without.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

2011
PAN Wikipedia Vandalism Corpus 2011 (PAN-WVC-11).
Dataset, July, 2011

PAN Plagiarism Corpus 2011 (PAN-PC-11).
Dataset, June, 2011

Fourth international workshop on uncovering plagiarism, authorship, and social software misuse.
SIGIR Forum, 2011

Cross-language plagiarism detection.
Lang. Resour. Evaluation, 2011

Query segmentation revisited.
Proceedings of the 20th International Conference on World Wide Web, 2011

Technologien zur Wiederverwendung von Texten aus dem Web.
Proceedings of the Ausgezeichnete Informatikdissertationen 2011, 2011

Overview of the 2nd International Competition on Wikipedia Vandalism Detection.
Proceedings of the CLEF 2011 Labs and Workshop, 2011

Overview of the 3rd International Competition on Plagiarism Detection.
Proceedings of the CLEF 2011 Labs and Workshop, 2011

The NETSPEAK WORDGRAPH: Visualizing keywords in context.
Proceedings of the IEEE Pacific Visualization Symposium, 2011

2010
PAN Wikipedia Vandalism Corpus 2010 (PAN-WVC-10).
Dataset, July, 2010

Webis Query Segmentation Corpus 2010 (Webis-QSeC-10).
Dataset, July, 2010

PAN Plagiarism Corpus 2010 (PAN-PC-10).
Dataset, May, 2010

Towards comment-based cross-media retrieval.
Proceedings of the 19th International Conference on World Wide Web, 2010

Crowdsourcing a wikipedia vandalism corpus.
Proceedings of the Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010

The power of naive query segmentation.
Proceedings of the Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010

Evaluating Humour Features on Web Comments.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

Corpus and Evaluation Measures for Automatic Plagiarism Detection.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

Retrieving Customary Web Language to Assist Writers.
Proceedings of the Advances in Information Retrieval, 2010

Netspeak - Assisting Writers in Choosing Words.
Proceedings of the Advances in Information Retrieval, 2010

Opinion Summarization of Web Comments.
Proceedings of the Advances in Information Retrieval, 2010

Cross-Language High Similarity Search: Why No Sub-linear Time Bound Can Be Expected.
Proceedings of the Advances in Information Retrieval, 2010

An Evaluation Framework for Plagiarism Detection.
Proceedings of the COLING 2010, 2010

Overview of the 1st International Competition on Wikipedia Vandalism Detection.
Proceedings of the CLEF 2010 LABs and Workshops, 2010

Overview of the 2nd International Competition on Plagiarism Detection.
Proceedings of the CLEF 2010 LABs and Workshops, 2010

2009
PAN Plagiarism Corpus 2009 (PAN-PC-09).
Dataset, September, 2009

Measuring the descriptiveness of web comments.
Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009

2008
Retrieval-Technologien für die Plagiaterkennung in Programmen.
Proceedings of the LWA 2008, 2008

Automatic Vandalism Detection in Wikipedia.
Proceedings of the Advances in Information Retrieval , 2008

A Wikipedia-Based Multilingual Retrieval Model.
Proceedings of the Advances in Information Retrieval , 2008

2007
Webis Wikipedia Vandalism Corpus (Webis-WVC-07).
Dataset, January, 2007

Strategies for retrieving plagiarized documents.
Proceedings of the SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2007

Wikipedia in the pocket: indexing technology for near-duplicate detection and high similarity search.
Proceedings of the SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2007

New Issues in Near-duplicate Detection.
Proceedings of the Data Analysis, Machine Learning and Applications, 2007

2006
Hashing-basierte Indizierung: Anwendungsszenarien, Theorie und Methoden.
Proceedings of the LWA 2006: Lernen - Wissensentdeckung - Adaptivität, Hildesheim, Deutschland, October 9th-11th 2006, joint workshop event of several interest groups of the German Society for Informatics (GI) - 14th Workshop on Adaptivity and User Modeling in Interactive Systems (ABIS 2006) - Workshop Information Retrieval 2006 of the Special Interest Group Information Retrieval (FGIR 2006) - Workshop on Knowledge and Experience Management (FGWM 2006), 2006

Putting Successor Variety Stemming to Work.
Proceedings of the Advances in Data Analysis, 2006


  Loading...