UDF to SQL Translation through Compositional Lazy Inductive SynthesisVirtual
Fri 22 Oct 2021 18:50 - 19:05 at Zurich C - Algorithms, Libraries and Databases - mirror Chair(s): Fabian Muehlboeck
Many data processing systems allow SQL queries that call \emph{user-defined functions (UDFs)} written in conventional programming languages. While such SQL extensions provide convenience and flexibility to users, queries involving UDFs are not as efficient as their pure SQL counterparts that invoke SQL's highly-optimized built-in functions. Motivated by this problem, we propose a new technique for translating SQL queries with UDFs to pure SQL expressions. Unlike prior work in this space, our method is not based on syntactic rewrite rules and can handle a much more general class of UDFs. At a high-level, our method is based on counterexample-guided inductive synthesis (CEGIS) but employs a novel compositional strategy that decomposes the synthesis task into simpler sub-problems. However, because there is no universal decomposition strategy that works for all UDFs, we propose a novel \emph{lazy inductive synthesis} approach that generates a sequence of decompositions that correspond to increasingly harder inductive synthesis problems. Because most realistic UDF-to-SQL translation tasks are amenable to a fine-grained decomposition strategy, our lazy inductive synthesis method scales significantly better than traditional CEGIS.
We have implemented our proposed technique in a tool called {\sc CLIS} for optimizing Spark SQL programs containing Scala UDFs. To evaluate {\sc CLIS}, we manually study 100 randomly selected UDFs and find that 63 of them can be expressed in pure SQL. Our evaluation on these 63 UDFs shows that {\sc CLIS} can automatically synthesize equivalent SQL expressions in 92% of the cases and that it can solve $2.4\times$ more benchmarks compared to a baseline that does not use our compositional approach. We also show that {\sc CLIS} yields an average speed-up of~$3.5\times$ for individual UDFs and $1.3\times$ to $3.1\times$ in terms of end-to-end application performance.
Fri 22 OctDisplayed time zone: Central Time (US & Canada) change
10:50 - 12:10 | |||
10:50 15mTalk | UDF to SQL Translation through Compositional Lazy Inductive SynthesisVirtual OOPSLA Guoqiang Zhang North Carolina State University; Facebook, Yuanchao Xu North Carolina State University, Xipeng Shen North Carolina State University; Facebook, Işıl Dillig University of Texas at Austin DOI | ||
11:05 15mTalk | LXM: Better Splittable Pseudorandom Number Generators (and Almost as Fast)Virtual OOPSLA DOI | ||
11:20 15mTalk | FPL: Fast Presburger Arithmetic through TransprecisionIn-Person OOPSLA Arjun Pitchanathan IIIT Hyderabad, Christian Ulmann ETH Zurich, Michel Weber ETH Zurich, Torsten Hoefler ETH Zurich, Tobias Grosser University of Edinburgh DOI | ||
11:35 15mTalk | Verifying Concurrent Multicopy Search StructuresIn-Person OOPSLA Nisarg Patel New York University, Siddharth Krishna Microsoft Research, Dennis Shasha New York University, Thomas Wies New York University DOI | ||
11:50 20mLive Q&A | Discussion, Questions and Answers OOPSLA |