Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Autotuning divide‐and‐conquer stencil computations

View through CrossRef
SummaryThis paper explores autotuning strategies for serial divide‐and‐conquer stencil computations, comparing the efficacy of traditional “heuristic” autotuning with that of “pruned‐exhaustive” autotuning. We present a pruned‐exhaustive autotuner called Ztune that searches for optimal divide‐and‐conquer trees for stencil computations. Ztune uses three pruning properties—space‐time equivalence, divide subsumption, and favored dimension—that greatly reduce the size of the search domain without significantly sacrificing the quality of the autotuned code. We compared the performance of Ztune with that of a state‐of‐the‐art heuristic autotuner called OpenTuner in tuning the divide‐and‐conquer algorithm used in Pochoir stencil compiler. Over a nightly run on ten application benchmarks across two machines with different hardware configurations, the Ztuned code ran 5%–12%faster on average, and the OpenTuner tuned code ran from 9%slower to 2%faster on average, than Pochoir's default code. In the best case, the Ztuned code ran 40%faster, and the OpenTuner tuned code ran 33%faster than Pochoir's code. Whereas the autotuning time of Ztune for each benchmark could be measured in minutes, to achieve comparable results, the autotuning time of OpenTuner was typically measured in hours or days. Surprisingly, for some benchmarks, Ztune actually autotuned faster than the time it takes to perform the stencil computation once.
Title: Autotuning divide‐and‐conquer stencil computations
Description:
SummaryThis paper explores autotuning strategies for serial divide‐and‐conquer stencil computations, comparing the efficacy of traditional “heuristic” autotuning with that of “pruned‐exhaustive” autotuning.
We present a pruned‐exhaustive autotuner called Ztune that searches for optimal divide‐and‐conquer trees for stencil computations.
Ztune uses three pruning properties—space‐time equivalence, divide subsumption, and favored dimension—that greatly reduce the size of the search domain without significantly sacrificing the quality of the autotuned code.
We compared the performance of Ztune with that of a state‐of‐the‐art heuristic autotuner called OpenTuner in tuning the divide‐and‐conquer algorithm used in Pochoir stencil compiler.
Over a nightly run on ten application benchmarks across two machines with different hardware configurations, the Ztuned code ran 5%–12%faster on average, and the OpenTuner tuned code ran from 9%slower to 2%faster on average, than Pochoir's default code.
In the best case, the Ztuned code ran 40%faster, and the OpenTuner tuned code ran 33%faster than Pochoir's code.
Whereas the autotuning time of Ztune for each benchmark could be measured in minutes, to achieve comparable results, the autotuning time of OpenTuner was typically measured in hours or days.
Surprisingly, for some benchmarks, Ztune actually autotuned faster than the time it takes to perform the stencil computation once.

Related Results

Effect of stencil wall aperture on solder paste release via stencil printing
Effect of stencil wall aperture on solder paste release via stencil printing
Abstract Solder paste printing is a process by which the correct amount of solder paste is applied to the printed circuit board via a stencil. The solder release fro...
autotuning with machine learning of OpenMP task applications
autotuning with machine learning of OpenMP task applications
Autotuning assisté par apprentissage automatique de tâches OpenMP Les architectures informatiques modernes sont très complexes, nécessitant un grand effort de progr...
Toward transparent and parsimonious methods for automatic performance tuning
Toward transparent and parsimonious methods for automatic performance tuning
Vers des méthodes transparentes et parcimonieuses pour l'optimisation automatique des performances La fin de la loi de Moore et de la loi de Dennard entraînent une ...
Stencil Aperture Area Ratio Extension - Impact of Stencil Technology and Coating
Stencil Aperture Area Ratio Extension - Impact of Stencil Technology and Coating
ABSTRACT Continued miniaturization of personal computing systems with increasing densities, drives the need for consistent solder paste print deposits to ensure m...
Local hilltop and debris-flow morphometrics predict drainage divide migration
Local hilltop and debris-flow morphometrics predict drainage divide migration
In terrestrial landscapes, neighboring catchments that experience contrasting erosion rates can be in disequilibrium such that drainage divides migrate. Cross-divide differences in...
Stencils for Mixed Flip Chip / SMT Assembly
Stencils for Mixed Flip Chip / SMT Assembly
ABSTRACT The requirement to combing FC (Flip Chip) and SMT assembly on the same substrate has increased dramatically with the demand for smaller and smaller assem...
Grain size signature of divide migration is restricted to local hillslope scale
Grain size signature of divide migration is restricted to local hillslope scale
Recent work has shown both that drainage divides shift location over geologic timescales in response to contrasts in erosion rates and that fluvial and hillslope grain size is corr...

Back to Top