IOOpt- Automatic Derivation of I/O complexity bounds for affine programs (SPLASH 2021 - Recent SIGPLAN Conference Paper Presentations)

Sun 17 - Fri 22 October 2021 Chicago, Illinois, United States

Who

Auguste Olivry, Guillaume Iooss, Nicolas Tollenaere, Atanas Rountev, Saday Sadayappan, Fabrice Rastello

Track

SPLASH 2021 SIGPLAN Papers

Time Zone

The program is currently displayed in (GMT-05:00) Central Time (US & Canada).

Use conference time zone: (GMT-05:00) Central Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 21 Oct 2021 15:55 - 16:10 at Zurich G - PLDI 2021 Papers 3 Chair(s): Fredrik Kjolstad

Abstract

Evaluating the complexity of an algorithm is an important step when developing applications as it highly impact both its time and energy performance. The arithmetic complexity which is the number of operations, regardless of the execution order, is, for neural networks (or more generally affine programs), easy to characterize. The data-movement (or, I/O) complexity characterization is way more complex as it refers to the minimum required number of I/O between a slow (e.g. main memory) and a fast (e.g. local scratchpad) storage location when considering all possible valid schedules.

This paper presents \IOOpt, a fully automated tool that automatically bounds the data movement of an affine (tilable) program. Given a tilable program described in a DSL, it automatically computes: 1.~a lower bound of the I/O complexity as a symbolic expression of the cache size and program parameters; 2.~an upper bound that allows to assess the tightness of the lower bound; 3.~a tiling recommendation (loop permutation and tile’s sizes) that matches the upper bound. While the associated algorithms for computing lower bound (and the associated implementations) that are described in this paper can be applied to any affine programs, a substantial effort has been made to provide bounds that are as tight as possible for neural networks. In particular, it extends the previous work of Olivry et al. so as to handle multi-dimensional reductions and expose the constraints associated to small dimensions that are both present in convolutions.
As for the upper bound algorithm, if the input can be a general tilable affine program (e.g. output of a polyhedral compiler such as PluTo), the involved algebraic computations are tuned to behave well on tensor computations such as direct tensor contractions or direct convolutions. As a bonus, the upper bound algorithm that has been extended to multi-level cache can provide a useful tiling recommendation to the programmer.

We demonstrate the effectiveness of our tool by deriving the symbolic lower and upper bounds for several tensor contraction and convolution kernels. Then, we evaluate numerically the tightness of our bound using Yolo9000 convolution layer sizes and representative tensor contractions from the TCCG benchmark suite. Finally, we show the pertinence of our I/O complexity model by reporting the run-time of the recommended tiled code for the convolution layers from Yolo9000.

Auguste Olivry

Inria, France

Guillaume Iooss

Inria

Nicolas Tollenaere

Inria

Atanas Rountev

Ohio State University

Saday Sadayappan

University of Utah, USA

Fabrice Rastello