Dataset of Enthymemes in Political Tweets

Overview

This dataset consists of political tweets annotated for the presence of enthymemes — arguments in which a key component is left implicit. For each tweet, multiple independent annotators determine whether an implicit premise, an implicit conclusion, or no implicit component is present, reconstruct the full propositional structure of the argument, and identify the underlying Walton argumentation scheme.

The tweets cover two topics in British political discourse: immigration policy and COVID-19 vaccination. They were drawn from the tropes corpus of Flaccavento et al. (2025) and selected to balance both topics and enthymeme types across the dataset.

A central design principle is the preservation of annotator disagreement. Rather than reducing multiple judgements to a single ground truth, all individual labels and reconstructions are retained and released alongside the data, enabling research into annotation variation as a substantive signal rather than noise to be discarded.

⚠ The dataset contains language directed at immigrants that some readers may find offensive. This reflects the nature of the source material and has not been filtered.

Annotation Schema

Each tweet is annotated independently by multiple annotators. Train and dev instances are annotated by five annotators each; test instances by three. Every annotator provides the following for each tweet:

Enthymeme Type

One of three labels: implicit_premise (an unstated supporting assumption the argument relies on), implicit_conclusion (a claim that follows from stated premises but is never expressed), or none (all components are explicit, or no argument is present).

Argument Reconstruction

The annotator writes out the full set of propositions — premises and conclusion — constituting the argument. The implicit component is marked with the tag (implicit). The example below illustrates a complete reconstruction.

"Deterring the plans of illegal people smugglers is essential to controlled immigration. We should support all plans to stop them."

Reconstructed argument

Premise 1 — implicit: Controlled immigration is desirable.

Premise 2 — explicit: Deterring the plans of illegal people smugglers is essential to controlled immigration.

Conclusion — explicit: We should support all plans to stop them.

Walton Argumentation Scheme

Annotators classify the argument using Walton's taxonomy of argumentation schemes. The most frequently attested schemes in the dataset include Argument from Cause to Effect, Argument from Inconsistent Commitment, Argument from Motive, Argument from Source Credibility, and Argument from Consequences. The full taxonomy, critical questions, and abstract scheme forms used in annotation are documented in the annotation guidelines.

ℹ The complete annotation schema, scheme definitions, worked examples, and edge-case rules are available in the Annotation Guidelines (PDF).

Data Splits & Release Schedule

The dataset is released in three stages. The train and dev sets released in mid-March are supersets of the initial sample.

1 March Data Sample — a small collection annotated by two annotators, intended to allow early familiarisation with the format and annotation conventions.
Mid-March First Release — the full train and dev sets, each annotated by five independent annotators (approximately 80/10 split).
Beginning April Final Release — train, dev, and test sets. The test set is annotated by three annotators. Train and dev sets are supersets of the first release.

Data Format

The dataset is distributed as CSV files — one per annotator per split — alongside a merged file aggregating all annotations. Each row corresponds to one tweet as annotated by one annotator.

Field	Description
`tweet_id`	Unique tweet identifier
`tweet_text`	Raw tweet content
`topic`	`immigration` or `vaccine`
`annotator_id`	Anonymised annotator code
`label`	`implicit_premise`, `implicit_conclusion`, or `none`
`scheme`	Walton scheme name, or `None`
`prop_1` … `prop_3`	Reconstructed propositions with inline role tags
`implicit_text`	Extracted text of the implicit proposition (convenience field)

Within proposition fields, the role of each proposition is marked inline. The implicit component carries the tag (implicit) appended to its text — e.g. "Controlled immigration is desirable. (implicit)".

Motivation

Enthymemes are among the most pervasive — and most underexplored — features of persuasive discourse. By leaving a key premise or conclusion unstated, an argument invites the reader to supply it themselves, producing the subjective impression that the inference is their own. This mechanism is especially effective in short-form political communication, where space is constrained and emotional register is prioritised over logical explicitness.

Detecting and reconstructing implicit argument components is directly relevant to computational fact-checking, misinformation research, and argument mining more broadly. A system capable of recovering the unstated premise underlying a political claim has taken a meaningful step toward auditing that claim's logical structure.

Most existing argument mining corpora treat annotation disagreement as noise to be minimised. This dataset treats it as a feature: genuine interpretive plurality is preserved and the resource is designed to support research into learning from disagreement rather than collapsing it into a single authoritative label.

References

[1] Aroyo, L., & Welty, C. (2015). Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine, 36(1), 15–24.
[2] Flaccavento, A., Peskine, Y., Papotti, P., Torlone, R., & Troncy, R. (2025). Automated detection of tropes in short texts. In Proceedings of COLING 2025.
[3] Plank, B., Hovy, D., & Søgaard, A. (2014). Linguistically debatable or just plain wrong? In Proceedings of ACL 2014, Vol. 2: Short Papers, pp. 507–511.
[4] Reboul, A. (2011). A relevance-theoretic account of the evolution of implicit communication. Studies in Pragmatics, 13(1).
[5] Uma, A., Fornaciari, T., Hovy, D., Paun, S., Plank, B., & Poesio, M. (2021). Learning from disagreement: A survey. Journal of Artificial Intelligence Research, 72, 1385–1470.
[6] Vallauri, E. L. et al. (2020). Implicit argumentation and persuasion: A measuring model. Journal of Argumentation in Context, 9, 95–123.
[7] Walton, D., Reed, C., & Macagno, F. (2008). Argumentation Schemes. Cambridge University Press.

Dataset of Enthymemes in Political Tweets

Overview

Annotation Schema

Enthymeme Type

Argument Reconstruction

Walton Argumentation Scheme

Data Splits & Release Schedule

Data Format

Motivation

References

Quick Access

Dataset at a Glance

Annotation Summary

Labels

Per Annotation

Affiliated Task

Organizers