liu.seSearch for publications in DiVA
Endre søk
Begrens søket
1 - 17 of 17
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Dastgeer, Usman
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system.
    Li, Lu
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system.
    Adaptive Implementation Selection in the SkePU Skeleton Programming Library2013Inngår i: Advanced Parallel Processing Technologies (APPT-2013), Proceedings / [ed] Chengyung Wu and Albert Cohen (eds.), 2013, s. 170-183Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In earlier work, we have developed the SkePU skeleton programming library for modern multicore systems equipped with one or more programmable GPUs. The library internally provides four types of implementations (implementation variants) for each skeleton: serial C++, OpenMP, CUDA and OpenCL targeting either CPU or GPU execution respectively. Deciding which implementation would run faster for a given skeleton call depends upon the computation, problem size(s), system architecture and data locality.

    In this paper, we present our work on automatic selection between these implementation variants by an offline machine learning method which generates a compact decision tree with low training overhead. The proposed selection mechanism is flexible yet high-level allowing a skeleton programmer to control different training choices at a higher abstraction level. We have evaluated our optimization strategy with 9 applications/kernels ported to our skeleton library and achieve on average more than 94% (90%) accuracy with just 0.53% (0.58%) training space exploration on two systems. Moreover, we discuss one application scenario where local optimization considering a single skeleton call can prove sub-optimal, and propose a heuristic for bulk implementation selection considering more than one skeleton call to address such application scenarios.

  • 2.
    Dastgeer, Usman
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Li, Lu
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    The PEPPHER composition tool: performance-aware composition for GPU-based systems2014Inngår i: Computing, ISSN 0010-485X, E-ISSN 1436-5057, Vol. 96, nr 12, s. 1195-1211Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The PEPPHER (EU FP7 project) component model defines the notion of component, interface and meta-data for homogeneous and heterogeneous parallel systems. In this paper, we describe and evaluate the PEPPHER composition tool, which explores the application’s components and their implementation variants, generates the necessary low-level code that interacts with the runtime system, and coordinates the native compilation and linking of the various code units to compose the overall application code to optimize performance. We discuss the concept of smart containers and its benefits for reducing dispatch overhead, exploiting implicit parallelism across component invocations and runtime optimization of data transfers. In an experimental evaluation with several applications, we demonstrate that the composition tool provides a high-level programming front-end while effectively utilizing the task-based PEPPHER runtime system (StarPU) underneath for different usage scenarios on GPU-based systems.

  • 3.
    Dastgeer, Usman
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Li, Lu
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    The PEPPHER Composition Tool: Performance-Aware Dynamic Composition of Applications for GPU-Based Systems2012Inngår i: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, IEEE, 2012, s. 711-720Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The PEPPHER component model defines an environment for annotation of native C/C++ based components for homogeneous and heterogeneous multicore and manycore systems, including GPU and multi-GPU based systems. For the same computational functionality, captured as a component, different sequential and explicitly parallel implementation variants using various types of execution units might be provided, together with metadata such as explicitly exposed tunable parameters. The goal is to compose an application from its components and variants such that, depending on the run-time context, the most suitable implementation variant will be chosen automatically for each invocation. We describe and evaluate the PEPPHER composition tool, which explores the application's components and their implementation variants, generates the necessary low-level code that interacts with the runtime system, and coordinates the native compilation and linking of the various code units to compose the overall application code. With several applications, we demonstrate how the composition tool provides a high-level programming front-end while effectively utilizing the task-based PEPPHER runtime system (StarPU) underneath.

  • 4.
    Ernstsson, August
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Li, Lu
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems2018Inngår i: International journal of parallel programming, ISSN 0885-7458, E-ISSN 1573-7640, Vol. 46, nr 1, s. 62-80Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this article we present SkePU 2, the next generation of the SkePU C++ skeleton programming framework for heterogeneous parallel systems. We critically examine the design and limitations of the SkePU 1 programming interface. We present a new, flexible and type-safe, interface for skeleton programming in SkePU 2, and a source-to-source transformation tool which knows about SkePU 2 constructs such as skeletons and user functions. We demonstrate how the source-to-source compiler transforms programs to enable efficient execution on parallel heterogeneous systems. We show how SkePU 2 enables new use-cases and applications by increasing the flexibility from SkePU 1, and how programming errors can be caught earlier and easier thanks to improved type safety. We propose a new skeleton, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend computations. We also discuss how the source-to-source compiler can enable a new optimization opportunity by selecting among multiple user function specializations when building a parallel program. Finally, we show that the performance of our prototype SkePU 2 implementation closely matches that of SkePU 1.

  • 5.
    Henrio, Ludovic
    et al.
    Univ Cote Azur, France.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Li, Lu
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Ensuring Memory Consistency in Heterogeneous Systems Based on Access Mode Declarations2018Inngår i: PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING and SIMULATION (HPCS), IEEE , 2018, s. 716-723Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Running a program on disjoint memory spaces requires to address memory consistency issues and to perform transfers so that the program always accesses the right data. Several approaches exist to ensure the consistency of the memory accessed, we are interested here in the verification of a declarative approach where each component of a computation is annotated with an access mode declaring which part of the memory is read or written by the component. The programming framework uses the component annotations to guarantee the validity of the memory accesses. This is the mechanism used in VectorPU, a C++ library for programming CPU-GPU heterogeneous systems and this article proves the correctness of the software cache-coherence mechanism used in the library. Beyond the scope of VectorPU, this article can be considered as a simple and effective formalisation of memory consistency mechanisms based on the explicit declaration of the effect of each component on each memory space.

  • 6.
    Henrio, Ludovic
    et al.
    Univ Claude Bernard Lyon 1, France; Univ Cote Azur, France.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Li, Lu
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Leveraging access mode declarations in a model for memory consistency in heterogeneous systems2020Inngår i: The Journal of logical and algebraic methods in programming, ISSN 2352-2208, E-ISSN 2352-2216, Vol. 110, artikkel-id UNSP 100498Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    On a system that exposes disjoint memory spaces to the software, a program has to address memory consistency issues and perform data transfers so that it always accesses valid data. Several approaches exist to ensure the consistency of the memory accessed. Here we are interested in the verification of a declarative approach where each component of a computation is annotated with an access mode declaring which part of the memory is read or written by the component. The programming framework uses the component annotations to guarantee the validity of the memory accesses. This is the mechanism used in VectorPU, a C++ library for programming CPU-GPU heterogeneous systems. This article proves the correctness of the software cache-coherence mechanism used in VectorPU. Beyond the scope of VectorPU, this article provides a simple and effective formalization of memory consistency mechanisms based on the explicit declaration of the effect of each component on each memory space. The formalism we propose also takes into account arrays for which a single validity status is stored for the whole array; additional mechanisms for dealing with overlapping arrays are also studied. (C) 2019 Elsevier Inc. All rights reserved.

  • 7.
    Kessler, Christoph
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Dastgeer, Usman
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Li, Lu
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Optimized Composition: Generating Efficient Code for Heterogeneous Systems from Multi-Variant Components, Skeletons and Containers2014Inngår i: Proc. First Workshop on Resource awareness and adaptivity in multi-core computing (Racing 2014), May 2014, Paderborn, Germany / [ed] F. Hannig and J. Teich, 2014, s. 43-48Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this survey paper, we review recent work on frameworks for the high-level, portable programming of heterogeneous multi-/manycore systems (especially, GPU-based systems) using high-level constructs such as annotated user-level software components, skeletons (i.e., predefined generic components) and containers, and discuss the optimization problems that need to be considered in selecting among multiple implementation variants, generating code and providing runtime support for efficient execution on such systems.

  • 8.
    Kessler, Christoph
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Li, Lu
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Atalar, Aras
    Chalmers University of Technology, Gothenburg, Sweden.
    Dobre, Alin
    Movidius, Dublin, Ireland.
    XPDL: Extensible Platform Description Language to Support Energy Modeling and Optimization2015Inngår i: Proc. 44th International Conference on Parallel Processing Workshops, ICPP-EMS Embedded Multicore Systems, in conjunction with ICPP-2015, Beijing, 1-4 sep. 2015, Institute of Electrical and Electronics Engineers (IEEE), 2015, s. 51-60Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present XPDL, a modular, extensible platform description language for heterogeneous multicore systems and clusters. XPDL specifications provide platform metadata about hardware and installed system software that are relevant for the adaptive static and dynamic optimization of application programs and system settings for improved performance and energy efficiency. XPDL is based on XML and uses hyperlinks to create distributed libraries of platform metadata specifications. We also provide first components of a retarget able tool chain that browses and processes XPDL specifications, and generates driver code for micro benchmarking to bootstrap empirical performance and energy models at deployment time. A C++ based API enables convenient introspection of platform models, even at run-time, which allows for adaptive dynamic program optimizations such as tuned selection of implementation variants.

  • 9.
    Li, Lu
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Programming Abstractions and Optimization Techniques for GPU-based Heterogeneous Systems2018Doktoravhandling, monografi (Annet vitenskapelig)
    Abstract [en]

    CPU/GPU heterogeneous systems have shown remarkable advantages in performance and energy consumption compared to homogeneous ones such as standard multi-core systems.Such heterogeneity represents one of the most promising trendsfor the near-future evolution of high performance computing hardware.However, as a double-edged sword, the heterogeneity also brings significant programming complexitiesthat prevent the easy and efficient usage of different such heterogeneous systems.In this thesis, we are interested in four such kinds of fundamental complexities that are associated withthese heterogeneous systems: measurement complexity (efforts required to measure a metric, e.g., measuring enegy), CPU-GPU selection complexity, platform complexity and data management complexity. We explore new low-cost programming abstractions to hide these complexities,and propose new optimization techniques that could be performed under the hood.

    For the measurement complexity, although measuring time is trivial by native library support,measuring energy consumption, especially for systems with GPUs, is complexbecause of the low level details involved such as choosing the right measurement methods, handling the trade-off between sampling rate and accuracy,and switching to different measurement metrics.We propose a clean interface with its implementationthat not only hides the complexity of energy measurement,but also unifies different kinds of measurements. The unificationbridges the gap between time measurement and energy measurement,and if no metric-specific assumptions related to time optimization techniques are made,energy optimization can be performedby blindly reusing time optimization techniques.

    For the CPU-GPU selection complexity, which relates to efficient utilization of heterogeneous hardware,we propose a new adaptive-sampling based construction mechanism of predictors for such selections which can adapt to different hardware platforms automatically,and shows non-trivial advantages over random sampling.

    For the platform complexity, we propose a new modular platform modeling language and its implementation to formally and systematically describe a computer system,enabling zero-overhead platform information queries for high-level software tool chains and for programmers as a basis for making software adaptive.

    For the data management complexity, we propose a new mechanism to enable a unified memory view on heterogeneous systemsthat have separate memory spaces. This mechanism enables programmers to write significantly less code,which runs equally fast with expert-written code and outperforms the current commercially available solution: Nvidia's Unified Memory.We further propose two data movement optimization techniques, lazy allocation and transfer fusion optimization.The two techniques are based on adaptively merging messages to reduce data transfer latency.We show that these techniques can be potentially beneficial and we prove that our greedy fusion algorithm is optimal.

    Finally, we show that our approaches to handle different complexities can be combined so that programmers could use them simultaneously.

    This research was partly funded by two EU FP7 projects (PEPPHER and EXCESS) and SeRC.

  • 10.
    Li, Lu
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Dastgeer, Usman
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Adaptive Off-Line Tuning for Optimized Composition of Components for Heterogeneous Many-Core Systems2013Inngår i: High Performance Computing for Computational Science - VECPAR 2012 / [ed] Dayde, Michel, Marques, Osni, Nakajima, Kengo, Springer, 2013, s. 329-345Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In recent years heterogeneous multi-core systems have been given much attention. However, performance optimization on these platforms remains a big challenge. Optimizations performed by compilers are often limited due to lack of dynamic information and run time environment, which makes applications often not performance portable. One current approach is to provide multiple implementations for the same interface that could be used interchangeably depending on the call context, and expose the composition choices to a compiler, deployment-time composition tool and/or run-time system. Using off-line machine-learning techniques allows to improve the precision and reduce the run-time overhead of run-time composition and leads to an improvement of performance portability. In this work we extend the run-time composition mechanism in the PEPPHER composition tool by off-line composition and present an adaptive machine learning algorithm for generating compact and efficient dispatch data structures with low training time. As dispatch data structure we propose an adaptive decision tree structure, which implies an adaptive training algorithm that allows to control the trade-off between training time, dispatch precision and run-time dispatch overhead.

    We have evaluated our optimization strategy with simple kernels (matrix-multiplication and sorting) as well as applications from RODINIA benchmark on two GPU-based heterogeneous systems. On average, the precision for composition choices reaches 83.6 percent with approximately 34 minutes off-line training time.

  • 11.
    Li, Lu
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Dastgeer, Usman
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Pruning strategies in adaptive off-line tuning for optimized composition of components on heterogeneous systems2016Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 51, s. 37-45Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Adaptive program optimizations, such as automatic selection of the expected fastest implementation variant for a computation component depending on hardware architecture and runtime context, are important especially for heterogeneous computing systems but require good performance models. Empirical performance models which require no or little human efforts show more practical feasibility if the sampling and training cost can be reduced to a reasonable level. In previous work we proposed an early version of adaptive sampling for efficient exploration and selection of training samples, which yields a decision-tree based method for representing, predicting and selecting the fastest implementation variants for given run-time call contexts property values. For adaptive pruning we use a heuristic convexity assumption. In this paper we consolidate and improve the method by new pruning techniques to better support the convexity assumption and control the trade-off between sampling time, prediction accuracy and runtime prediction overhead. Our results show that the training time can be reduced by up to 39 times without noticeable prediction accuracy decrease. (C) 2015 Elsevier B.V. All rights reserved.

  • 12.
    Li, Lu
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Dastgeer, Usman
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Pruning strategies in adaptive off-line tuning for optimized composition of components on heterogeneous systems2014Inngår i: 2014 43rd International Conference on Parallel Processing Workshops (ICCPW), IEEE conference proceedings, 2014, s. 255-264Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Adaptive program optimizations, such as automatic selection of the expected fastest implementation variant for a computation component depending on runtime context, are important especially for heterogeneous computing systems but require good performance models. Empirical performance models based on trial executions which require no or little human efforts show more practical feasibility if the sampling and training cost can be reduced to a reasonable level. In previous work we proposed an early version of adaptive pruning algorithm for efficient selection of training samples, a decision-tree based method for representing, predicting and selecting the fastest implementation variants for given run-time call context properties, and a composition tool for building the overall composed application from its components. For adaptive pruning we use a heuristic convexity assumption. In this paper we consolidate and improve the method by new pruning techniques to better support the convexity assumption and better control the trade-off between sampling time, prediction accuracy and runtime prediction overhead. Our results show that the training time can be reduced by up to 39 times without noticeable prediction accuracy decrease. Furthermore, we evaluate the effect of combinations of pruning strategies and compare our adaptive sampling method with random sampling. We also use our smart-sampling method as a preprocessor to a state-of-the-art decision tree learning algorithm and compare the result to the predictor directly calculated by our method.

  • 13.
    Li, Lu
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Lazy Allocation and Transfer Fusion Optimization for GPU-based Heterogeneous Systems2018Inngår i: 2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), IEEE , 2018, s. 311-315Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present two memory optimization techniques which improve the efficiency of data transfer over PCIe bus for GPU-based heterogeneous systems, namely lazy allocation and transfer fusion optimization. Both are based on merging data transfers so that less overhead is incurred, thereby increasing transfer throughput and making accelerator usage profitable also for smaller operand sizes. We provide the design and prototype implementation of the two techniques in CUDA. Microbench-marking results show that especially for smaller and medium-sized operands significant speedups can be achieved. We also prove that our transfer fusion optimization algorithm is optimal.

  • 14.
    Li, Lu
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    MeterPU: A Generic Measurement Abstraction API Enabling Energy-tuned Skeleton Backend Selection2015Inngår i: Trustcom/BigDataSE/ISPA, 2015 IEEE, IEEE Press, 2015, Vol. 3, s. 154-159Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present MeterPU, an easy-to-use, generic and low-overhead abstraction API for taking measurements of various metrics (time, energy) on different hardware components (e.g. CPU, DRAM, GPU), using pluggable platform-specific measurement implementations behind a common interface in C++. We show that with MeterPU, not only legacy (time) optimization frameworks, such as autotuned skeleton back-end selection, can be easily retargeted for energy optimization, but also switching different optimization goals for arbitrary code sections now becomes trivial. We apply MeterPU to implement the first energytunable skeleton programming framework, based on the SkePU skeleton programming library.

  • 15.
    Li, Lu
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    MeterPU: a generic measurement abstraction API: Enabling energy-tuned skeleton backend selection2018Inngår i: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 74, nr 11, s. 5643-5658Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present MeterPU, an easy-to-use, generic and low-overhead abstraction API for taking measurements of various metrics (time, energy) on different hardware components (e.g., CPU, DRAM, GPU) in a heterogeneous computer system, using pluggable platform-specific measurement implementations behind a common interface in C++. We show that with MeterPU, not only legacy (time) optimization frameworks, such as autotuned skeleton back-end selection, can be easily retargeted for energy optimization, but also switching between measurement metrics or techniques for arbitrary code sections now becomes trivial. We apply MeterPU to implement the first energy-tunable skeleton programming framework, based on the SkePU skeleton programming library.

  • 16.
    Sjöström, Oskar
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Ko, Soon Heum
    Linköpings universitet, Nationellt superdatorcentrum (NSC).
    Dastgeer, Usman
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Li, Lu
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Portable Parallelization of the EDGE CFD Application for GPU-based Systems using the SkePU Skeleton Programming Library2016Inngår i: Parallel Computing: On the Road to Exascale / [ed] Gerhard R. Joubert; Hugh Leather; Mark Parsons; Frans Peters; Mark Sawyer, IOS Press, 2016, s. 135-144Konferansepaper (Fagfellevurdert)
    Abstract [en]

    EDGE is a complex application for computational fluid dynamics used e.g. for aerodynamic simulations in avionics industry. In this work we present the portable, high-level parallelization of EDGE for execution on multicore CPU and GPU based systems by using the multi-backend skeleton programming library SkePU. We first expose the challenges of applying portable high-level parallelization to a complex scientific application for a heterogeneous (GPU-based) system using (SkePU) skeletons and discuss the encountered flexibility problems that usually do not show up in skeleton toy programs. We then identify and implement necessary improvements in SkePU to become applicable for applications containing computations on complex data structures and with irregular data access. In particular, we improve the MapArray skeleton and provide a new MultiVector container for operand data that can be used with unstructured grid data structures. Although there is no SkePU skeleton specifically dedicated to handling computations on unstructured grids and its data structures, we still obtain portable speedup of EDGE with both multicore CPU and GPU execution by using the improved MapArray skeleton of SkePU.

  • 17.
    Thorarensen, Sebastian
    et al.
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska fakulteten.
    Cuello, Rosandra
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska fakulteten.
    Kessler, Christoph
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Li, Lu
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Barry, Brendan
    Movidius Ltd, Ireland.
    Efficient Execution of SkePU Skeleton Programs on the Low-power Multicore Processor Myriad22016Inngår i: 2016 24TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP), IEEE , 2016, s. 398-402Konferansepaper (Fagfellevurdert)
    Abstract [en]

    SkePU is a state-of-the-art skeleton programming library for high-level portable programming and efficient execution on heterogeneous parallel computer systems, with a publically available implementation for general-purpose multicore CPU and multi-GPU systems. This paper presents the design, implementation and evaluation of a new back-end of the SkePU skeleton programming library for the new low-power multicore processor Myriad2 by Movidius Ltd. This enables seamless code portability of SkePU applications across both HPC and embedded (Myriad2) parallel computing systems, with decent performance, on these architecturally very diverse types of execution platforms.

1 - 17 of 17
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf