liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Many-Core Systems
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology. (PELAB)ORCID iD: 0000-0002-1940-3331
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.ORCID iD: 0000-0001-5241-0026
FernUniversität in Hagen, Germany (Parallelität und VLSI).
FernUniversität in Hagen, Germany (Parallelität und VLSI).
2015 (English)In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, Vol. 11, no 4, 62- p.Article in journal (Refereed) Published
Abstract [en]

Exploiting effectively massively parallel architectures is a major challenge that stream programming can help facilitate. We investigate the problem of generating energy-optimal code for a collection of streaming tasks that include parallelizable or moldable tasks on a generic manycore processor with dynamic discrete frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin at user level in a data-driven way. A stream of data flows through the tasks and intermediate results may be forwarded to other tasks, as in a pipelined task graph. In this article, we consider crown scheduling, a novel technique for the combined optimization of resource allocation, mapping, and discrete voltage/frequency scaling for moldable streaming task collections in order to optimize energy efficiency given a throughput constraint. We first present optimal offline algorithms for separate and integrated crown scheduling based on integer linear programming (ILP). We make no restricting assumption about speedup behavior. We introduce the fast heuristic Longest Task, Lowest Group (LTLG) as a generalization of the Longest Processing Time (LPT) algorithm to achieve a load-balanced mapping of parallel tasks, and the Height heuristic for crown frequency scaling. We use them in feedback loop heuristics based on binary search and simulated annealing to optimize crown allocation.

Our experimental evaluation of the ILP models for a generic manycore architecture shows that at least for small and medium-sized streaming task collections even the integrated variant of crown scheduling can be solved to optimality by a state-of-the-art ILP solver within a few seconds. Our heuristics produce makespan and energy consumption close to optimality within the limits of the phase-separated crown scheduling technique and the crown structure. Their optimization time is longer than the one of other algorithms we test, but our heuristics consistently produce better solutions.

Place, publisher, year, edition, pages
New York, NY, USA: ACM Digital Library, 2015. Vol. 11, no 4, 62- p.
Keyword [en]
Multicore, frequency scaling, manycore, mapping, parallel energy, scheduling, streaming
National Category
Computer Systems
URN: urn:nbn:se:liu:diva-114280DOI: 10.1145/2687653ISI: 000348232000028OAI: diva2:788822
The 10th HiPEAC conference, January 19-21, Amsterdam, The Netherlands
EU, FP7, Seventh Framework Programme, EXCESSSwedish e‐Science Research Center, OpCoReS
Available from: 2015-02-16 Created: 2015-02-16 Last updated: 2015-03-02Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Melot, NicolasKessler, Christoph
By organisation
Software and SystemsThe Institute of Technology
Computer Systems

Search outside of DiVA

GoogleGoogle ScholarTotal: 16 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 54 hits
ReferencesLink to record
Permanent link

Direct link