liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 111) Show all publications
Ernstsson, A., Griebler, D. & Kessler, C. (2023). Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems. International journal of parallel programming, 51, 61-82
Open this publication in new window or tab >>Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems
2023 (English)In: International journal of parallel programming, ISSN 0885-7458, E-ISSN 1573-7640, Vol. 51, p. 61-82Article in journal (Refereed) Published
Abstract [en]

We analyze the performance portability of the skeleton-based, single-source multi-backend high-level programming framework SkePU across multiple different CPU-GPU heterogeneous systems. Thereby, we provide a systematic application efficiency characterization of SkePU-generated code in comparison to equivalent hand-written code in more low-level parallel programming models such as OpenMP and CUDA. For this purpose, we contribute ports of the STREAM benchmark suite and of a part of the NAS Parallel Benchmark suite to SkePU. We show that for STREAM and the EP benchmark, SkePU regularly scores efficiency values above 80% and in particular for CPU systems, SkePU can outperform hand-written code.

Place, publisher, year, edition, pages
Springer / Plenum, 2023
Keywords
Algorithmic skeletons; Parallel efficiency; Performance portability; Heterogeneous parallel computing; High-level parallel programming
National Category
Embedded Systems
Identifiers
urn:nbn:se:liu:diva-190797 (URN)10.1007/s10766-022-00746-1 (DOI)000894718600001 ()
Note

Funding Agencies|ELLIIT; project GPAI; [SNIC 2021/22-971]

Available from: 2023-01-03 Created: 2023-01-03 Last updated: 2024-02-06Bibliographically approved
Ernstsson, A., Vandenbergen, N., Keller, J. & Kessler, C. (2022). A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based Programming of Heterogeneous Parallel Systems. International journal of parallel programming, 50, 319-340
Open this publication in new window or tab >>A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based Programming of Heterogeneous Parallel Systems
2022 (English)In: International journal of parallel programming, ISSN 0885-7458, E-ISSN 1573-7640, Vol. 50, p. 319-340Article in journal (Refereed) Published
Abstract [en]

SkePU is a pattern-based high-level programming model for transparent program execution on heterogeneous parallel computing systems. A key feature of SkePU is that, in general, the selection of the execution platform for a skeleton-based function call need not be determined statically. On single-node systems, SkePU can select among CPU, multithreaded CPU, single or multi-GPU execution. Many scientific applications use pseudo-random number generators (PRNGs) as part of the computation. In the interest of correctness and debugging, deterministic parallel execution is a desirable property, which however requires a deterministically parallelized pseudo-random number generator. We present the API and implementation of a deterministic, portable parallel PRNG extension to SkePU that is scalable by design and exhibits the same behavior regardless where and with how many resources it is executed. We evaluate it with four probabilistic applications and show that the PRNG enables scalability on both multi-core CPU and GPU resources, and hence supports the universal portability of SkePU code even in the presence of PRNG calls, while source code complexity is reduced.

Place, publisher, year, edition, pages
Springer / Plenum, 2022
Keywords
Skeleton programming; Parallelizable algorithmic pattern; Heterogeneous system; GPGPU; Deterministic parallel pseudo-random number generator
National Category
Embedded Systems
Identifiers
urn:nbn:se:liu:diva-184112 (URN)10.1007/s10766-022-00726-5 (DOI)000771886000001 ()
Note

Funding Agencies|EUEuropean Commission [801015]; CUGS, Linkoping University

Available from: 2022-04-07 Created: 2022-04-07 Last updated: 2023-03-14Bibliographically approved
Keller, J., Litzinger, S. & Kessler, C. (2022). Integrating Energy-Optimizing Scheduling of Moldable Streaming Tasks with Design Space Exploration for Multiple Core Types on Configurable Platforms. Journal of Signal Processing Systems, 94, 849-864
Open this publication in new window or tab >>Integrating Energy-Optimizing Scheduling of Moldable Streaming Tasks with Design Space Exploration for Multiple Core Types on Configurable Platforms
2022 (English)In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 94, p. 849-864Article in journal (Refereed) Published
Abstract [en]

Design space exploration of a configurable, heterogeneous system for a given application with required throughput searches for a combination of cores or softcores with different architectures that can be accommodated within the given ASIC or FPGA area and that achieves the required throughput and optimizes power consumption. For a soft real-time streaming application, modeled as a task graph with internally parallelizable streaming tasks, this requires assigning a core type and quantity and DVFS frequency level to each task, which implies task runtime and energy consumption, and mapping and scheduling the tasks, such that the throughput requirement is met. We tightly integrate such static scheduling for stream processing applications with design space exploration of the best heterogeneous core combination, and solve the resulting combined optimization problem by an integer linear program (ILP). We evaluate our solution for different numbers of core types on synthetic and application-based task graphs, and demonstrate improvements of up to 34.8% for ARM big and LITTLE cores, and 70.5% for 3 different core types.

Place, publisher, year, edition, pages
Springer, 2022
Keywords
Design space exploration; Task scheduling; Energy efficiency
National Category
Computer Engineering
Identifiers
urn:nbn:se:liu:diva-187498 (URN)10.1007/s11265-022-01787-y (DOI)000819286800002 ()
Note

Funding Agencies|Projekt DEAL

Available from: 2022-08-25 Created: 2022-08-25 Last updated: 2023-05-16Bibliographically approved
Keller, J., Litzinger, S. & Kessler, C. (2021). Combining Design Space Exploration with Task Scheduling of Moldable Streaming Tasks on Reconfigurable Platforms. In: Derrien, S., Hannig, F., Diniz, P.C., Chillet, D. (Ed.), : . Paper presented at International Symposium on Applied Reconfigurable Computing (ARC 2021) (pp. 97-107). Springer Berlin/Heidelberg
Open this publication in new window or tab >>Combining Design Space Exploration with Task Scheduling of Moldable Streaming Tasks on Reconfigurable Platforms
2021 (English)In: / [ed] Derrien, S., Hannig, F., Diniz, P.C., Chillet, D., Springer Berlin/Heidelberg, 2021, p. 97-107Conference paper, Published paper (Refereed)
Abstract [en]

Design space exploration can be used to find a power-efficient architectural design for a given application, such as the best suited configuration of a heterogeneous system from soft cores of different types, given area and throughput constraints. We show how to integrate design space exploration into a static scheduling algorithm for a streaming task graph application with parallelizable tasks and solve the resulting combined optimization problem by an integer linear program (ILP). We demonstrate the improvements by our strategy with ARM big and LITTLE soft cores and synthetic task graphs.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2021
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 12700
Keywords
Design space exploration, Task scheduling, Energy efficiency
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-184902 (URN)10.1007/978-3-030-79025-7_7 (DOI)
Conference
International Symposium on Applied Reconfigurable Computing (ARC 2021)
Projects
ELLIIT
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications, GPAI
Available from: 2022-05-10 Created: 2022-05-10 Last updated: 2022-05-10
Kessler, C., Keller, J. & Litzinger, S. (2021). Temperature-Aware Energy-Optimal Scheduling of Moldable Streaming Tasks onto 2D-Mesh-Based Many-Core CPUs with DVFS. In: Klusáček, D., Cirne, W., Rodrigo, G.P. (Ed.), JSSPP 2021: Job Scheduling Strategies for Parallel Processing: . Paper presented at 24th International Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), ELECTR NETWORK, MAY 21, 2021 (pp. 168-189). Springer Berlin/Heidelberg, 12985
Open this publication in new window or tab >>Temperature-Aware Energy-Optimal Scheduling of Moldable Streaming Tasks onto 2D-Mesh-Based Many-Core CPUs with DVFS
2021 (English)In: JSSPP 2021: Job Scheduling Strategies for Parallel Processing / [ed] Klusáček, D., Cirne, W., Rodrigo, G.P., Springer Berlin/Heidelberg, 2021, Vol. 12985, p. 168-189Conference paper, Published paper (Refereed)
Abstract [en]

We consider the problem of energy-optimally mapping a set of moldable-parallel tasks in the steady-state pattern of a software-pipelined streaming computation onto a generic many-core CPU architecture with a 2D mesh geometry, where the execution voltage and frequency levels of the cores can be selected dynamically from a given set of discrete DVFS levels. We extend the Crown Scheduling technique for parallelizable tasks to temperature-aware scheduling, taking into account the tasks’ heat generation, the heat limit for each core, and the heat diffusion along the 2D mesh geometry of typical many-core CPU architectures. Our approach introduces a systematic method for alternating task executions between disjoint “buddy” core groups in subsequent iterations of crown schedules to avoid long-time overheating of cores. We present two integer linear program (ILP) solutions with different degrees of flexibility, and show that these can be solved for realistic problem sizes with today’s ILP solver technology. Experiments with several streaming task graphs derived from real-world applications show that the flexibility for the scheduler can be greatly increased by considering buddy-cores, thus finding feasible solutions in scenarios that could not be solved otherwise. We also present a fast heuristic for the same problem.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2021
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 12985
Keywords
Temperature-aware scheduling, Parallelizable tasks, Many-core CPU, Energy optimization, DVFS
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-184903 (URN)10.1007/978-3-030-88224-2_9 (DOI)000869960400009 ()9783030882235 (ISBN)9783030882242 (ISBN)
Conference
24th International Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), ELECTR NETWORK, MAY 21, 2021
Projects
ELLIIT GPAI
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications, GPAI
Note

Funding: ELLIIT

Available from: 2022-05-10 Created: 2022-05-10 Last updated: 2022-11-16
Melot, N., Kessler, C., Eitschberger, P. & Keller, J. (2019). Co-optimizing Core Allocation, Mapping and DVFS in Streaming Programs with Moldable Tasks for Energy Efficient Execution on Manycore Architectures. In: 2019 19th International Conference on Application of Concurrency to System Design (ACSD): . Paper presented at 19th International Conference on Application of Concurrency to System Design (ACSD-2019), Aachen, Germany, June 23-28, 2019 (pp. 63-72).
Open this publication in new window or tab >>Co-optimizing Core Allocation, Mapping and DVFS in Streaming Programs with Moldable Tasks for Energy Efficient Execution on Manycore Architectures
2019 (English)In: 2019 19th International Conference on Application of Concurrency to System Design (ACSD), 2019, p. 63-72Conference paper, Published paper (Refereed)
Abstract [en]

Stream programming abstracts parallelism complexity by modeling a program as a set of streaming tasks. Tasks run repeatedly and can even be internally parallel, i.e., use one or multiple cores simultaneously (moldable). The throughput of the streaming application, as well as its energy consumption, depends strongly on scheduling, i.e., on how tasks are mapped to cores, and on the frequency at which they run. Crown scheduling is a scheduling method that reduces this problem's combinatorial complexity considerably by introducing a few additional restrictions especially on tasks' core allocation sizes and mapping. While it has previously been shown to outperform competing methods, the impact of these restrictions on the schedule quality has, up to now, never been analyzed quantitatively. In this paper, we first propose several crown scheduler improvements toward fewer restrictions. Also, we provide an Integer Linear Programming formulation that solves the same optimization problem without the inherent restrictions of crown scheduling. While in an extreme case an unrestricted schedule might use 3.7 times less energy than a crown schedule for a realistic execution platform model, we show that in practical benchmarks the difference is small while crown schedulers are significantly faster than unrestricted scheduling. We experimentally confirm this with benchmarks derived from random task collections, classic parallel algorithms as well as the Streamit benchmark suite.

Keywords
parallel computing, energy efficiency, scheduling, mapping, resource allocation, dynamic voltage and frequency scaling, combinatorial optimization, integer linear programming, crown scheduling
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-168602 (URN)10.1109/ACSD.2019.00011 (DOI)978-1-7281-3843-5 (ISBN)
Conference
19th International Conference on Application of Concurrency to System Design (ACSD-2019), Aachen, Germany, June 23-28, 2019
Available from: 2020-08-26 Created: 2020-08-26 Last updated: 2020-08-27Bibliographically approved
Keller, J. & Kessler, C. (2019). Dealing with Hardware Faults in Energy-Efficient Static Schedules of Multi-Variant Programs on Heterogeneous Platforms. In: C. Trinitis, T. Pionteck (Ed.), 32nd GI/ITG International Conference on Architecture of Computing Systems May 20 – 21, 2019, Technical University of Denmark, Copenhagen, Denmark Workshop Proceedings: . Paper presented at 15th GI/ITG Workshop on Dependability and Fault Tolerance (VERFE'19), Copenhagen, Denmark, 20-21 May 2019. Co-located with ARCS'19 conference.. VDE Verlag GmbH
Open this publication in new window or tab >>Dealing with Hardware Faults in Energy-Efficient Static Schedules of Multi-Variant Programs on Heterogeneous Platforms
2019 (English)In: 32nd GI/ITG International Conference on Architecture of Computing Systems May 20 – 21, 2019, Technical University of Denmark, Copenhagen, Denmark Workshop Proceedings / [ed] C. Trinitis, T. Pionteck, VDE Verlag GmbH, 2019Conference paper, Published paper (Refereed)
Abstract [en]

We investigate the energy-efficient execution of programs with a sequence of program parts, each part executable by multiple variants on different execution units. We study their behaviour under the presence of crash faults on a computing platform with heterogeneous execution units like multicore, GPU, and FPGA. To this end, we extend a static scheduling algorithm for computing the sequence of variants leading to minimum runtime, minimum energy consumption, or a weighted sum of both, to consider cases where one or more program variants cannot be used anymore from some execution point on, due to failure of the underlying execution unit(s). This extension combines the advantageous results of static scheduling, known in the fault-free case, with avoidance of overhead for re-scheduling in case of a fault. We evaluate our algorithm with synthetically generated progam task graphs. The results indicate that, compared to computing a new schedule for each fault case, our algorithm only needs 55% of the scheduling time for 8 variants.

Place, publisher, year, edition, pages
VDE Verlag GmbH, 2019
Keywords
parallel computing, fault tolerance, energy efficiency, scheduling, heterogeneous computer systems
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-168601 (URN)978-3-8007-4957-7 (ISBN)978-3-8007-4958-4 (ISBN)
Conference
15th GI/ITG Workshop on Dependability and Fault Tolerance (VERFE'19), Copenhagen, Denmark, 20-21 May 2019. Co-located with ARCS'19 conference.
Available from: 2020-08-26 Created: 2020-08-26 Last updated: 2020-08-26
Kessler, C. (2019). Global Optimization of Operand Transfer Fusion in Heterogeneous Computing. In: SCOPES '19: Proceedings of the 22nd International Workshop on Software and Compilers for Embedded Systems: . Paper presented at 22nd International Workshop on Software and Compilers for Embedded Systems (SCOPES-2019), St. Goar, Germany, May 2019 (pp. 49-58). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Global Optimization of Operand Transfer Fusion in Heterogeneous Computing
2019 (English)In: SCOPES '19: Proceedings of the 22nd International Workshop on Software and Compilers for Embedded Systems, Association for Computing Machinery (ACM), 2019, p. 49-58Conference paper, Published paper (Refereed)
Abstract [en]

We consider the problem of minimizing, for a dataflow graph of kernel calls, the overall number of operand data transfers, and thus, the accumulated transfer startup overhead, in heterogeneous systems with non-shared memory. Our approach analyzes the kernel-operand dependence graph and reorders the operand arrays in memory such that transfers and memory allocations of multiple operands adjacent in memory can be merged, saving transfer startup costs and memory allocation overheads.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2019
Keywords
heterogeneous computing, program optimization, GPU, memory mapping, kernel dataflow graph, Hamiltonian path, message fusion, allocation fusion, distributed memory, Heterogeneous computing, data transfer fusion, CUDA
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-168603 (URN)10.1145/3323439.3323981 (DOI)9781450367622 (ISBN)
Conference
22nd International Workshop on Software and Compilers for Embedded Systems (SCOPES-2019), St. Goar, Germany, May 2019
Funder
EU, Horizon 2020, 801015
Available from: 2020-08-26 Created: 2020-08-26 Last updated: 2020-08-26
Litzinger, S., Keller, J. & Kessler, C. (2019). Scheduling Moldable Parallel Streaming Tasks on Heterogeneous Platforms with Frequency Scaling. In: 2019 27th European Signal Processing Conference (EUSIPCO): . Paper presented at 2019 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, Sep. 2019 (pp. 1-5). IEEE
Open this publication in new window or tab >>Scheduling Moldable Parallel Streaming Tasks on Heterogeneous Platforms with Frequency Scaling
2019 (English)In: 2019 27th European Signal Processing Conference (EUSIPCO), IEEE, 2019, p. 1-5Conference paper, Published paper (Refereed)
Abstract [en]

We extend static scheduling of parallelizable tasks to machines with multiple core types, taking differences in performance and power consumption due to task type into account. Next to energy minimization for given deadline, i.e. for given throughput requirement, we consider makespan minimization for given energy or average power budgets. We evaluate our approach by comparing schedules of synthetic task sets for big.LITTLE with other schedulers from literature. We achieve an improvement of up to 33%.

Place, publisher, year, edition, pages
IEEE, 2019
Series
27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), ISSN 2076-1465
Keywords
parallel computing, scheduling, mapping, energy efficiency, heterogeneous computer system, integer linear programming, combinatorial optimization
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-168600 (URN)10.23919/EUSIPCO.2019.8903180 (DOI)000604567700497 ()2-s2.0-8507559932 (Scopus ID)978-9-0827-9703-9 (ISBN)
Conference
2019 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, Sep. 2019
Available from: 2020-08-26 Created: 2020-08-26 Last updated: 2024-01-31
Memeti, S., Li, L., Pllana, S., Kolodziej, J. & Kessler, C. (2017). Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption. In: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing: . Paper presented at 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing (ARMS-CC'17), Washington, DC, USA (pp. 1-6). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
Show others...
2017 (English)In: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, Association for Computing Machinery (ACM), 2017, p. 1-6Conference paper, Published paper (Refereed)
Abstract [en]

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward. In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines required to parallelize the code using a specific framework. We use our tools MeterPU and x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2017
Keywords
heterogeneous computing, parallel computing, parallel programming models, comparative study, OpenCL, OpenACC, OpenMP, CUDA, Programming productivity, Performance, Energy consumption, GPU, Xeon-Phi, MeterPU
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-168604 (URN)10.1145/3110355.3110356 (DOI)9781450351164 (ISBN)
Conference
2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing (ARMS-CC'17), Washington, DC, USA
Available from: 2020-08-26 Created: 2020-08-26 Last updated: 2020-08-27
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-5241-0026

Search in DiVA

Show all publications