Regex Matching with Counting-Set Automata
We propose a solution to the problem of efficient matching regular expressions (regexes) with bounded repetition, such as $\mathtt{(ab){1,100}}$, using deterministic automata. For this, we introduce novel \emph{counting set automata (CsAs)}, automata with registers that can hold sets of bounded integers and can be manipulated by a limited portfolio of constant-time operations. We present an algorithm that compiles a large sub-class of regexes to deterministic CsAs. This includes (1) a novel Antimirov-style translation of regexes with counting to \emph{counting automata (CAs)}, nondeterministic automata with bounded counters, and (2) our main technical contribution, a determinization of CAs that outputs CsAs. The main advantage of this workflow is that the size of the produced CsAs does not depend on the repetition bounds used in the regex (while the size of the DFA is exponential to them). Our experimental results confirm that deterministic CsAs produced from practical regexes with repetition are indeed vastly smaller than the corresponding DFAs. More importantly, our prototype matcher based on CsA simulation handles practical regexes with repetition regardless of sizes of counter bounds. It easily copes with regexes with repetition where state-of-the-art matchers struggle.
Fri 22 OctDisplayed time zone: Central Time (US & Canada) change
10:50 - 12:10 | |||
10:50 15mTalk | DiffStream: Differential Output Testing for Stream Processing Programs SIGPLAN Papers Konstantinos Kallas University of Pennsylvania, Filip Niksic Google, Caleb Stanford University of Pennsylvania, Rajeev Alur University of Pennsylvania | ||
11:05 15mTalk | Guided Linking: Dynamic Linking Without the Costs SIGPLAN Papers Sean Bartell University of Illinois at Urbana-Champaign, Will Dietz University of Illinois at Urbana-Champaign, Vikram S. Adve University of Illinois at Urbana-Champaign, USA Link to publication DOI | ||
11:20 15mTalk | Regex Matching with Counting-Set Automata SIGPLAN Papers Lukáš Holík Brno University of Technology, Ondřej Lengál Brno University of Technology, Olli Saarikivi Microsoft, Lenka Turoňová Brno University of Technology, Margus Veanes Microsoft, Tomáš Vojnar Brno University of Technology | ||
11:35 15mTalk | Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API Usages, and DifferencesIn-Person SIGPLAN Papers Mehdi Bagherzadeh Oakland University, Nicholas Fireman Oakland University, Anas Shawesh Oakland University, Raffi Khatchadourian CUNY Hunter College Link to publication DOI Pre-print Media Attached | ||
11:50 20mLive Q&A | Discussion, Questions and Answers SIGPLAN Papers |