IOOpt- Automatic Derivation of I/O complexity bounds for affine programs
Evaluating the complexity of an algorithm is an important step when developing applications as it highly impact both its time and energy performance. The arithmetic complexity which is the number of operations, regardless of the execution order, is, for neural networks (or more generally affine programs), easy to characterize. The data-movement (or, I/O) complexity characterization is way more complex as it refers to the minimum required number of I/O between a slow (e.g. main memory) and a fast (e.g. local scratchpad) storage location when considering all possible valid schedules.
This paper presents \IOOpt, a fully automated tool that automatically bounds the data movement of an affine (tilable) program. Given a tilable program described in a DSL, it automatically computes: 1.~a lower bound of the I/O complexity as a symbolic expression of the cache size and program parameters; 2.~an upper bound that allows to assess the tightness of the lower bound; 3.~a tiling recommendation (loop permutation and tile’s sizes) that matches the upper bound. While the associated algorithms for computing lower bound (and the associated implementations) that are described in this paper can be applied to any affine programs, a substantial effort has been made to provide bounds that are as tight as possible for neural networks. In particular, it extends the previous work of Olivry et al. so as to handle multi-dimensional reductions and expose the constraints associated to small dimensions that are both present in convolutions.
As for the upper bound algorithm, if the input can be a general tilable affine program (e.g. output of a polyhedral compiler such as PluTo), the involved algebraic computations are tuned to behave well on tensor computations such as direct tensor contractions or direct convolutions. As a bonus, the upper bound algorithm that has been extended to multi-level cache can provide a useful tiling recommendation to the programmer.
We demonstrate the effectiveness of our tool by deriving the symbolic lower and upper bounds for several tensor contraction and convolution kernels. Then, we evaluate numerically the tightness of our bound using Yolo9000 convolution layer sizes and representative tensor contractions from the TCCG benchmark suite. Finally, we show the pertinence of our I/O complexity model by reporting the run-time of the recommended tiled code for the convolution layers from Yolo9000.
Thu 21 OctDisplayed time zone: Central Time (US & Canada) change
15:40 - 17:00
PLDI 2021 Papers 3SIGPLAN Papers at Zurich G
Chair(s): Fredrik Kjolstad Stanford University
|Automatically Enforcing Fresh and Consistent Inputs in Intermittent Systems|
Milijana Surbatovich Carnegie Mellon University, Limin Jia Carnegie Mellon University, Brandon Lucia Carnegie Mellon University, USA
|IOOpt- Automatic Derivation of I/O complexity bounds for affine programs|
Auguste Olivry Inria, France, Guillaume Iooss Inria, Nicolas Tollenaere Inria, Atanas Rountev Ohio State University, Saday Sadayappan University of Utah, USA, Fabrice Rastello Inria, France
|Integration Verification Across Software and Hardware for a Simple Embedded System|
Andres Erbsen MIT, Samuel Gruetter Massachusetts Institute of Technology, Joonwon Choi Massachusetts Institute of Technology, USA, Clark Wood Massachusetts Institute of Technology, Adam Chlipala Massachusetts Institute of Technology
|Execution reconstruction: Harnessing failure reoccurrences for failure reproduction|
Gefei Zuo University of Michigan, Jiacheng Ma University of Michigan, Andrew Quinn University of Michigan, Pramod Bhatotia University of Edinburgh, Pedro Fonseca Purdue University, Baris Kasikci University of Michigan, USA
|Discussion, Questions and Answers|