SPLASH 2021
Sun 17 - Fri 22 October 2021 Chicago, Illinois, United States
Fri 22 Oct 2021 10:50 - 11:05 at Zurich C - Algorithms, Libraries and Databases Chair(s): Hans-J. Boehm
Fri 22 Oct 2021 18:50 - 19:05 at Zurich C - Algorithms, Libraries and Databases - mirror Chair(s): Fabian Muehlboeck

Many data processing systems allow SQL queries that call \emph{user-defined functions (UDFs)} written in conventional programming languages. While such SQL extensions provide convenience and flexibility to users, queries involving UDFs are not as efficient as their pure SQL counterparts that invoke SQL's highly-optimized built-in functions. Motivated by this problem, we propose a new technique for translating SQL queries with UDFs to pure SQL expressions. Unlike prior work in this space, our method is not based on syntactic rewrite rules and can handle a much more general class of UDFs. At a high-level, our method is based on counterexample-guided inductive synthesis (CEGIS) but employs a novel compositional strategy that decomposes the synthesis task into simpler sub-problems. However, because there is no universal decomposition strategy that works for all UDFs, we propose a novel \emph{lazy inductive synthesis} approach that generates a sequence of decompositions that correspond to increasingly harder inductive synthesis problems. Because most realistic UDF-to-SQL translation tasks are amenable to a fine-grained decomposition strategy, our lazy inductive synthesis method scales significantly better than traditional CEGIS.

We have implemented our proposed technique in a tool called {\sc CLIS} for optimizing Spark SQL programs containing Scala UDFs. To evaluate {\sc CLIS}, we manually study 100 randomly selected UDFs and find that 63 of them can be expressed in pure SQL. Our evaluation on these 63 UDFs shows that {\sc CLIS} can automatically synthesize equivalent SQL expressions in 92% of the cases and that it can solve $2.4\times$ more benchmarks compared to a baseline that does not use our compositional approach. We also show that {\sc CLIS} yields an average speed-up of~$3.5\times$ for individual UDFs and $1.3\times$ to $3.1\times$ in terms of end-to-end application performance.

Fri 22 Oct

Displayed time zone: Central Time (US & Canada) change

10:50 - 12:10
Algorithms, Libraries and DatabasesOOPSLA at Zurich C +8h
Chair(s): Hans-J. Boehm Google
10:50
15m
Talk
UDF to SQL Translation through Compositional Lazy Inductive SynthesisVirtual
OOPSLA
Guoqiang Zhang North Carolina State University; Facebook, Yuanchao Xu North Carolina State University, Xipeng Shen North Carolina State University; Facebook, Isil Dillig University of Texas at Austin
DOI
11:05
15m
Talk
LXM: Better Splittable Pseudorandom Number Generators (and Almost as Fast)Virtual
OOPSLA
Guy L. Steele Jr. Oracle Labs, Sebastiano Vigna University of Milan
DOI
11:20
15m
Talk
FPL: Fast Presburger Arithmetic through TransprecisionIn-PersonDistinguished Paper
OOPSLA
Arjun Pitchanathan IIIT Hyderabad, Christian Ulmann ETH Zurich, Michel Weber ETH Zurich, Torsten Hoefler ETH Zurich, Tobias Grosser University of Edinburgh
DOI
11:35
15m
Talk
Verifying Concurrent Multicopy Search StructuresIn-Person
OOPSLA
Nisarg Patel New York University, Siddharth Krishna Microsoft Research, Dennis Shasha New York University, Thomas Wies New York University
DOI
11:50
20m
Live Q&A
Discussion, Questions and Answers
OOPSLA

18:50 - 20:10
Algorithms, Libraries and Databases - mirrorOOPSLA at Zurich C
Chair(s): Fabian Muehlboeck IST Austria
18:50
15m
Talk
UDF to SQL Translation through Compositional Lazy Inductive SynthesisVirtual
OOPSLA
Guoqiang Zhang North Carolina State University; Facebook, Yuanchao Xu North Carolina State University, Xipeng Shen North Carolina State University; Facebook, Isil Dillig University of Texas at Austin
DOI
19:05
15m
Talk
LXM: Better Splittable Pseudorandom Number Generators (and Almost as Fast)Virtual
OOPSLA
Guy L. Steele Jr. Oracle Labs, Sebastiano Vigna University of Milan
DOI
19:20
15m
Talk
FPL: Fast Presburger Arithmetic through TransprecisionIn-PersonDistinguished Paper
OOPSLA
Arjun Pitchanathan IIIT Hyderabad, Christian Ulmann ETH Zurich, Michel Weber ETH Zurich, Torsten Hoefler ETH Zurich, Tobias Grosser University of Edinburgh
DOI
19:35
15m
Talk
Verifying Concurrent Multicopy Search StructuresIn-Person
OOPSLA
Nisarg Patel New York University, Siddharth Krishna Microsoft Research, Dennis Shasha New York University, Thomas Wies New York University
DOI
19:50
20m
Live Q&A
Discussion, Questions and Answers
OOPSLA