Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Many-Core Systems
2015 (English)In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, Vol. 11, no 4, 62- p.Article in journal (Refereed) Published
Exploiting effectively massively parallel architectures is a major challenge that stream programming can help facilitate. We investigate the problem of generating energy-optimal code for a collection of streaming tasks that include parallelizable or moldable tasks on a generic manycore processor with dynamic discrete frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin at user level in a data-driven way. A stream of data flows through the tasks and intermediate results may be forwarded to other tasks, as in a pipelined task graph. In this article, we consider crown scheduling, a novel technique for the combined optimization of resource allocation, mapping, and discrete voltage/frequency scaling for moldable streaming task collections in order to optimize energy efficiency given a throughput constraint. We first present optimal offline algorithms for separate and integrated crown scheduling based on integer linear programming (ILP). We make no restricting assumption about speedup behavior. We introduce the fast heuristic Longest Task, Lowest Group (LTLG) as a generalization of the Longest Processing Time (LPT) algorithm to achieve a load-balanced mapping of parallel tasks, and the Height heuristic for crown frequency scaling. We use them in feedback loop heuristics based on binary search and simulated annealing to optimize crown allocation.
Our experimental evaluation of the ILP models for a generic manycore architecture shows that at least for small and medium-sized streaming task collections even the integrated variant of crown scheduling can be solved to optimality by a state-of-the-art ILP solver within a few seconds. Our heuristics produce makespan and energy consumption close to optimality within the limits of the phase-separated crown scheduling technique and the crown structure. Their optimization time is longer than the one of other algorithms we test, but our heuristics consistently produce better solutions.
Place, publisher, year, edition, pages
New York, NY, USA: ACM Digital Library, 2015. Vol. 11, no 4, 62- p.
Multicore, frequency scaling, manycore, mapping, parallel energy, scheduling, streaming
IdentifiersURN: urn:nbn:se:liu:diva-114280DOI: 10.1145/2687653ISI: 000348232000028OAI: oai:DiVA.org:liu-114280DiVA: diva2:788822
The 10th HiPEAC conference, January 19-21, Amsterdam, The Netherlands
FunderEU, FP7, Seventh Framework Programme, EXCESSSwedish e‐Science Research Center, OpCoReS