Javascript must be enabled to continue!
Autotuning divide‐and‐conquer stencil computations
View through CrossRef
SummaryThis paper explores autotuning strategies for serial divide‐and‐conquer stencil computations, comparing the efficacy of traditional “heuristic” autotuning with that of “pruned‐exhaustive” autotuning. We present a pruned‐exhaustive autotuner called Ztune that searches for optimal divide‐and‐conquer trees for stencil computations. Ztune uses three pruning properties—space‐time equivalence, divide subsumption, and favored dimension—that greatly reduce the size of the search domain without significantly sacrificing the quality of the autotuned code. We compared the performance of Ztune with that of a state‐of‐the‐art heuristic autotuner called OpenTuner in tuning the divide‐and‐conquer algorithm used in Pochoir stencil compiler. Over a nightly run on ten application benchmarks across two machines with different hardware configurations, the Ztuned code ran 5%–12%faster on average, and the OpenTuner tuned code ran from 9%slower to 2%faster on average, than Pochoir's default code. In the best case, the Ztuned code ran 40%faster, and the OpenTuner tuned code ran 33%faster than Pochoir's code. Whereas the autotuning time of Ztune for each benchmark could be measured in minutes, to achieve comparable results, the autotuning time of OpenTuner was typically measured in hours or days. Surprisingly, for some benchmarks, Ztune actually autotuned faster than the time it takes to perform the stencil computation once.
Title: Autotuning divide‐and‐conquer stencil computations
Description:
SummaryThis paper explores autotuning strategies for serial divide‐and‐conquer stencil computations, comparing the efficacy of traditional “heuristic” autotuning with that of “pruned‐exhaustive” autotuning.
We present a pruned‐exhaustive autotuner called Ztune that searches for optimal divide‐and‐conquer trees for stencil computations.
Ztune uses three pruning properties—space‐time equivalence, divide subsumption, and favored dimension—that greatly reduce the size of the search domain without significantly sacrificing the quality of the autotuned code.
We compared the performance of Ztune with that of a state‐of‐the‐art heuristic autotuner called OpenTuner in tuning the divide‐and‐conquer algorithm used in Pochoir stencil compiler.
Over a nightly run on ten application benchmarks across two machines with different hardware configurations, the Ztuned code ran 5%–12%faster on average, and the OpenTuner tuned code ran from 9%slower to 2%faster on average, than Pochoir's default code.
In the best case, the Ztuned code ran 40%faster, and the OpenTuner tuned code ran 33%faster than Pochoir's code.
Whereas the autotuning time of Ztune for each benchmark could be measured in minutes, to achieve comparable results, the autotuning time of OpenTuner was typically measured in hours or days.
Surprisingly, for some benchmarks, Ztune actually autotuned faster than the time it takes to perform the stencil computation once.
Related Results
Effect of stencil wall aperture on solder paste release via stencil printing
Effect of stencil wall aperture on solder paste release via stencil printing
Abstract
Solder paste printing is a process by which the correct amount of solder paste is applied to the printed circuit board via a stencil. The solder release fro...
Stencil Design Guidelines for Robust Printing Processes in Electronics Production Considering Stencil and Solder Paste Specific Properties
Stencil Design Guidelines for Robust Printing Processes in Electronics Production Considering Stencil and Solder Paste Specific Properties
ABSTRACT
Solder paste stencil printing takes up a central position in electronics production. Nearly two-thirds of all process defects originate in the stencil pr...
autotuning with machine learning of OpenMP task applications
autotuning with machine learning of OpenMP task applications
Autotuning assisté par apprentissage automatique de tâches OpenMP
Les architectures informatiques modernes sont très complexes, nécessitant un grand effort de progr...
Toward transparent and parsimonious methods for automatic performance tuning
Toward transparent and parsimonious methods for automatic performance tuning
Vers des méthodes transparentes et parcimonieuses pour l'optimisation automatique des performances
La fin de la loi de Moore et de la loi de Dennard entraînent une ...
Stencil Aperture Area Ratio Extension - Impact of Stencil Technology and Coating
Stencil Aperture Area Ratio Extension - Impact of Stencil Technology and Coating
ABSTRACT
Continued miniaturization of personal computing systems with increasing densities, drives the need for consistent solder paste print deposits to ensure m...
Local hilltop and debris-flow morphometrics predict drainage divide migration
Local hilltop and debris-flow morphometrics predict drainage divide migration
In terrestrial landscapes, neighboring catchments that experience contrasting erosion rates can be in disequilibrium such that drainage divides migrate. Cross-divide differences in...
Stencils for Mixed Flip Chip / SMT Assembly
Stencils for Mixed Flip Chip / SMT Assembly
ABSTRACT
The requirement to combing FC (Flip Chip) and SMT assembly on the same substrate has increased dramatically with the demand for smaller and smaller assem...
Grain size signature of divide migration is restricted to local hillslope scale
Grain size signature of divide migration is restricted to local hillslope scale
Recent work has shown both that drainage divides shift location over geologic timescales in response to contrasts in erosion rates and that fluvial and hillslope grain size is corr...

