Publication result detail

Content-Invariant Spatio-Temporal Neural Framework for Forgery Detection in Image Sequences

BUZOVSKÝ, V.; PŘINOSIL, J.; ŘÍHA, K.; SMÉKAL, Z.

Original Title

Content-Invariant Spatio-Temporal Neural Framework for Forgery Detection in Image Sequences

English Title

Content-Invariant Spatio-Temporal Neural Framework for Forgery Detection in Image Sequences

Type

Paper in proceedings (conference paper)

Original Abstract

The increasing prevalence of deepfake videos underscores the need for effective and reliable detection methods. In this study, we propose a hybrid deepfake detection framework that integrates a static image forgery detector with a recurrent neural network (RNN) to exploit both spatial and temporal fea- tures. Specifically, we utilize an existing frame-level detector that identifies common forgery artifacts within individual frames. This is followed by a Long Short-Term Memory (LSTM) network that models temporal dependencies across frames, enabling detection of inconsistencies that are overlooked in frame-by-frame anal- ysis. Experimental results demonstrate that temporal modeling significantly improves accuracy over frame-level baselines. Our contributions are twofold: (i) we provide empirical evidence that deepfake videos exhibit detectable temporal signatures, and (ii) we construct a compact, real-world evaluation set of deepfake videos. Notably, detection performance on this dataset is lower than on standard benchmarks, suggesting a domain gap between commonly used training data and real-world deepfakes.

English abstract

The increasing prevalence of deepfake videos underscores the need for effective and reliable detection methods. In this study, we propose a hybrid deepfake detection framework that integrates a static image forgery detector with a recurrent neural network (RNN) to exploit both spatial and temporal fea- tures. Specifically, we utilize an existing frame-level detector that identifies common forgery artifacts within individual frames. This is followed by a Long Short-Term Memory (LSTM) network that models temporal dependencies across frames, enabling detection of inconsistencies that are overlooked in frame-by-frame anal- ysis. Experimental results demonstrate that temporal modeling significantly improves accuracy over frame-level baselines. Our contributions are twofold: (i) we provide empirical evidence that deepfake videos exhibit detectable temporal signatures, and (ii) we construct a compact, real-world evaluation set of deepfake videos. Notably, detection performance on this dataset is lower than on standard benchmarks, suggesting a domain gap between commonly used training data and real-world deepfakes.

Keywords

deepfake detection, computer vision, spatio-temporal features, recurrent neural networks, image manipulation.

Key words in English

deepfake detection, computer vision, spatio-temporal features, recurrent neural networks, image manipulation.

Authors

BUZOVSKÝ, V.; PŘINOSIL, J.; ŘÍHA, K.; SMÉKAL, Z.

Released

03.12.2025

ISBN

979-8-3315-7675-2

Book

International Conference on Ultra Modern Telecommunications and Workshops

Periodical

International Congress on Ultra Modern Telecommunications and Workshops

State

United States of America

Pages from

240

Pages to

245

Pages count

5

BibTex

@inproceedings{BUT199661,
  author="{} and Viktor {Buzovský} and Jiří {Přinosil} and Kamil {Říha} and Zdeněk {Smékal}",
  title="Content-Invariant Spatio-Temporal Neural Framework for Forgery Detection in Image Sequences",
  booktitle="International Conference on Ultra Modern Telecommunications and Workshops",
  year="2025",
  journal="International Congress on Ultra Modern Telecommunications and Workshops",
  pages="240--245",
  doi="10.1109/ICUMT67815.2025.11268794",
  isbn="979-8-3315-7675-2"
}