⬇ Dataset (CSV) 📄 Annotation Guidelines 📋 Annotator Feedback 🔍 Explore Dataset Online 🧩 Most Frequent Schemes Online 🌐 MediaEval 2026

General Information

The goal of this task is to develop AI models capable of detecting and reconstructing implicit arguments — also called enthymemes — in political tweets. The dataset contains tweets annotated for the presence of implicit premises or conclusions, along with full argument reconstructions provided by multiple annotators.

The first data sample will be released on 1 March 2025. Participants are invited to complete two tasks. While they may choose to complete only Task 1, completion of Task 2 is conditional upon prior completion of Task 1.

⚠ Participants should be aware that the data contains language hurtful towards immigrants and should be ready for this when reading the data.

Task 1 — Enthymeme Detection

Given a tweet, determine whether it contains an implicit premise, an implicit conclusion, or neither. This is a three-class classification task.

InputOutput
The raw text of a tweet. One label: implicit_premise, implicit_conclusion, or none.

An implicit premise is a supporting assumption left unstated that the argument relies on. An implicit conclusion is a claim that follows from the stated premises but is never explicitly made. When neither component is missing, the label is none.

Tweets in the train and dev sets are each annotated by five independent annotators; those in the test set by three. Individual annotator labels — prior to any majority vote — are provided alongside the data, making it possible to treat disagreement as signal rather than noise.

Constrained Run 1 — Text Only

Predict the label from the tweet text alone. No external data or additional annotation information is permitted.

Constrained Run 2 — With Annotator Labels

In addition to the tweet text, participants may use the raw labels provided by three independent annotators. The goal is to investigate whether modelling annotator disagreement improves performance, especially on borderline cases. The output label is the same three-class prediction.

Additional InputGoal
Three individual annotator labels per tweet (before majority vote). Leverage disagreement as a signal to improve the three-class prediction.

Open Run

Any external data sources, pre-trained models, or additional resources may be used. Participants must document all external resources in their working-notes paper.

Evaluation

Performance is measured with macro F1-score. Evaluation is conducted in two modes:

ℹ Both modes are reported. The merged mode is the primary metric used for ranking in the official evaluation.

Task 2 — Proposition Generation

For each tweet classified as containing an implicit argument, generate the text of the missing proposition. Task 2 requires prior completion of Task 1, as the predicted label is part of the input.

InputOutput
Tweet text + Task 1 label (implicit_premise or implicit_conclusion). A single natural-language sentence expressing the missing proposition.

The generated sentence should be concise and declarative — it should make the unstated assumption or conclusion fully explicit, as if completing the argument. See the example below.

Tweet
"Deterring the plans of illegal people smugglers is essential to controlled immigration. We should support all plans to stop them."
Reconstructed argument
Premise 1 — implicit (to generate): Controlled immigration is desirable.
Premise 2 — explicit: Deterring the plans of illegal people smugglers is essential to controlled immigration.
Conclusion — explicit: We should support all plans to stop them.

In this example, the system should output: "Controlled immigration is desirable."

Evaluation

Task 2 systems are evaluated specifically on their ability to reconstruct the implicit premise — the unstated supporting assumption the argument relies on. The gold-standard reference for each tweet is the implicit premise provided by the annotators in the dataset. In the annotation files, the implicit premise is identifiable as the propositional content marked with (implicit), which annotators write as the text of the missing proposition followed by that tag (e.g. "Vaccines cause harm to the body (implicit).").

Generated propositions are evaluated in two ways. First, BERTScore is used to compare system outputs against the annotator-provided implicit premises. Second, a sampled subset of the test set is evaluated by hand by experienced human annotators, who assess whether the generated proposition correctly captures the implicit assumption underlying the argument.

Motivation and Background

Enthymemes — arguments with missing components — represent a fundamental challenge in understanding persuasive discourse. These implicit arguments are particularly prevalent in social media contexts, where they serve as powerful means of persuasion. By leaving key premises or conclusions unstated, enthymemes lead readers to perceive the implicit content as their own reasoning, making them especially effective rhetorical devices.

The significance of detecting and reconstructing enthymemes extends beyond theoretical interest in argumentation theory. Enthymemes facilitate deceptive argumentation and manipulation, and help in spreading disinformation. Understanding how implicit premises operate in controversial political discourse is therefore crucial for developing tools to combat misinformation and promote critical thinking.

The task is inherently interpretative, involving natural language inference and semantic interpretation where high human disagreement is common. Our approach explicitly acknowledges this by employing multiple independent annotators per instance, enabling us to treat human label variation as signal rather than noise.

Data

The dataset consists of tweets annotated for the presence of enthymemes. For each enthymeme, annotators also propose a reconstruction of the implicit and explicit propositional content and argument structure. The tweets are a subset of the tropes dataset by Flaccavento et al. (2025), selected to balance two topics: immigration in the UK and the COVID-19 vaccine.

Train and dev sets are annotated by five annotators each; the test set by three. The data is released in three stages as outlined in the timeline above.

Target Group

This task is interesting to anyone who works in text analysis, including researchers in natural language processing, argument mining, computational linguistics, misinformation detection, and social media analysis, from both academic and industrial settings.

We especially welcome interdisciplinary teams from argumentation theory, philosophy, rhetoric, communication studies, political science, and computational social science. Explicit structural modelling, linguistic feature-based approaches, and rule-based systems of all sorts are encouraged.

Quest for Insight

In addition to conventional benchmarking papers, participants are invited to submit "Quest for Insight" papers addressing a research question aimed at gaining deeper understanding of implicit argumentation. Full task results are not required for these papers. Example questions include:

References

Quick Access

Task Summary

Task 1 — Classification

Task 2 — Generation

Task 1 Runs

Data at a Glance

Paper Types

Organizers