Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikačního výsledku
HAVLENA, V.; HOLÍK, L.; LENGÁL, O.; VAŠÁK, J.; GULČÍKOVÁ, S.
Originální název
Towards Efficient Matching of Regexes with Backreferences using Register Set Automata
Anglický název
Druh
Článek Scopus
Originální abstrakt
Matching regexes (regular expressions) is a common problem in many areas of computer science, with requirements on high speed and robust performance. Regexes with backreferences allow one to express certain patterns (even beyond regular) concisely, however, since the matching is usually done by backtracking, the matching speed can degrade to a degree that constitutes a service failure or a security threat. To facilitate high-speed matching of such regexes, we propose register set automata (RSAs), an extension of register automata where registers can contain sets of symbols (from a potentially infinite alphabet) and the following operations are supported: adding input values to registers, merging or clearing registers, and testing whether a register contains a value. We show that a large class of register automata can be transformed into deterministic RSAs, which can serve as a basis for fast matching of a family of regexes with single-letter capture groups and backreferences. We also give a derivative-based algorithm for transforming a large class of regexes with backreferences to register automata and show that the time complexity of matching is linear and quadratic to the length of the input for finite and infinite alphabets respectively. Our prototype implementation of a regex matcher shows that our approach can significantly improve the robustness of state-of-the-art regex matchers on regexes with backreferences. We also study the theoretical properties of the model and show that the emptiness problem for RSAs is decidable and complete for the F ω class and that RSAs are incomparable in expressive power to other popular automata models over data words.
Anglický abstrakt
Klíčová slova
Antimirov’s derivatives | backreferences | determinization | ReDoS | register automata | register set automata | regular expression matching
Klíčová slova v angličtině
Autoři
Vydáno
08.06.2026
Nakladatel
Association for Computing Machinery
Periodikum
Proceedings of the ACM on Programming Languages-PACMPL
Svazek
10
Číslo
PLDI
Stát
Spojené státy americké
Strany od
855
Strany do
880
Strany počet
26
URL
https://dl.acm.org/doi/10.1145/3808281
BibTex
@article{BUT211925, author="Vojtěch {Havlena} and Lukáš {Holík} and Ondřej {Lengál} and Jan {Vašák} and Sabína {Gulčíková}", title="Towards Efficient Matching of Regexes with Backreferences using Register Set Automata", journal="Proceedings of the ACM on Programming Languages-PACMPL", year="2026", volume="10", number="PLDI", pages="26", doi="10.1145/3808281", url="https://dl.acm.org/doi/10.1145/3808281" }