Detail výsledku VaV

Originální název

Examining the Metrics for Document-Level Claim Extraction in Czech and Slovak

Anglický název

Examining the Metrics for Document-Level Claim Extraction in Czech and Slovak

Druh

Stať ve sborníku v databázi WoS či Scopus

Originální abstrakt

Document-level claim extraction remains an open challenge in the field of fact-checking, and subsequently, methods for evaluating extracted claims have received limited attention. In this work, we explore approaches to aligning two sets of claims pertaining to the same source document and computing their similarity through an alignment score. We investigate techniques to identify the best possible alignment and evaluation method between claim sets, with the aim of providing a reliable evaluation framework. Our approach enables comparison between model-extracted and human-annotated claim sets, serving as a metric for assessing the extraction performance of models and also as a possible measure of inter-annotator agreement. We conduct experiments on newly collected dataset—claims extracted from comments under Czech and Slovak news articles—domains that pose additional challenges due to the informal language, strong local context, and subtleties of these closely related languages. The results draw attention to the limitations of current evaluation approaches when applied to document-level claim extraction and highlight the need for more advanced methods—ones able to correctly capture semantic similarity and evaluate essential claim properties such as atomicity, checkworthiness, and decontextualization.

Anglický abstrakt

Document-level claim extraction remains an open challenge in the field of fact-checking, and subsequently, methods for evaluating extracted claims have received limited attention. In this work, we explore approaches to aligning two sets of claims pertaining to the same source document and computing their similarity through an alignment score. We investigate techniques to identify the best possible alignment and evaluation method between claim sets, with the aim of providing a reliable evaluation framework. Our approach enables comparison between model-extracted and human-annotated claim sets, serving as a metric for assessing the extraction performance of models and also as a possible measure of inter-annotator agreement. We conduct experiments on newly collected dataset—claims extracted from comments under Czech and Slovak news articles—domains that pose additional challenges due to the informal language, strong local context, and subtleties of these closely related languages. The results draw attention to the limitations of current evaluation approaches when applied to document-level claim extraction and highlight the need for more advanced methods—ones able to correctly capture semantic similarity and evaluate essential claim properties such as atomicity, checkworthiness, and decontextualization.

Klíčová slova

fact-checking, claim extraction, similarity metrics

Klíčová slova v angličtině

fact-checking, claim extraction, similarity metrics

Autoři

MAKAIOVÁ, L.; FAJČÍK, M.; JAROLÍM, A.

Rok RIV

2026

Vydáno

05.12.2025

ISBN

978-80-263-1858-3

Kniha

Proceedings of the Nineteenth Workshop on Recent Advances in Slavonic Natural Languages Processing

Periodikum

Recent Advances in Slavonic Natural Language Processing

Číslo

2025

Stát

Česká republika

Strany od

15

Strany do

24

Strany počet

10

URL

https://raslan2025.nlp-consulting.net/

BibTex

@inproceedings{BUT199486,
  author="Lucia {Makaiová} and Martin {Fajčík} and Antonín {Jarolím}",
  title="Examining the Metrics for Document-Level Claim Extraction in Czech and Slovak",
  booktitle="Proceedings of the Nineteenth Workshop on Recent Advances in Slavonic Natural Languages Processing",
  year="2025",
  journal="Recent Advances in Slavonic Natural Language Processing",
  number="2025",
  pages="15--24",
  isbn="978-80-263-1858-3",
  issn="2336-4289",
  url="https://raslan2025.nlp-consulting.net/"
}

VUT

Fakulty a vysokoškolské ústavy

Součásti

Examining the Metrics for Document-Level Claim Extraction in Czech and Slovak