liu.seSearch for publications in DiVA
Change search
Refine search result
123 101 - 111 of 111
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 101.
    Melot, Nicolas
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Keller, Joerg
    FernUniversität in Hagen, Germany (Parallelität und VLSI).
    Eitschberger, Patrick
    FernUniversität in Hagen, Germany (Parallelität und VLSI).
    Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Many-Core Systems2015In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, Vol. 11, no 4, p. 62-Article in journal (Refereed)
    Abstract [en]

    Exploiting effectively massively parallel architectures is a major challenge that stream programming can help facilitate. We investigate the problem of generating energy-optimal code for a collection of streaming tasks that include parallelizable or moldable tasks on a generic manycore processor with dynamic discrete frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin at user level in a data-driven way. A stream of data flows through the tasks and intermediate results may be forwarded to other tasks, as in a pipelined task graph. In this article, we consider crown scheduling, a novel technique for the combined optimization of resource allocation, mapping, and discrete voltage/frequency scaling for moldable streaming task collections in order to optimize energy efficiency given a throughput constraint. We first present optimal offline algorithms for separate and integrated crown scheduling based on integer linear programming (ILP). We make no restricting assumption about speedup behavior. We introduce the fast heuristic Longest Task, Lowest Group (LTLG) as a generalization of the Longest Processing Time (LPT) algorithm to achieve a load-balanced mapping of parallel tasks, and the Height heuristic for crown frequency scaling. We use them in feedback loop heuristics based on binary search and simulated annealing to optimize crown allocation.

    Our experimental evaluation of the ILP models for a generic manycore architecture shows that at least for small and medium-sized streaming task collections even the integrated variant of crown scheduling can be solved to optimality by a state-of-the-art ILP solver within a few seconds. Our heuristics produce makespan and energy consumption close to optimality within the limits of the phase-separated crown scheduling technique and the crown structure. Their optimization time is longer than the one of other algorithms we test, but our heuristics consistently produce better solutions.

  • 102.
    Melot, Nicolas
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Keller, Jörg
    Fern Universitat in Hagen, Fac. of Math. and Computer Science, Hagen, Germany.
    Efficient On-Chip Pipelined Streaming Computations on Scalable Manycore Architectures2012Conference paper (Other academic)
    Abstract [en]

    Performance of manycore processors is limited by programs' use of off-chip main memory. Streaming computation organized in a pipeline limits accesses to main memory to tasks at boundaries of the pipeline to read or write to main memory. The Single Chip Cloud computer (SCC) offers 48 cores linked by a high-speed on-chip network, and allows the implementation of such on-chip pipelined technique. We assess the performance and constraints provided by the SCC and investigate on on-chip pipelined mergesort as a case study for streaming computations. We found that our on-chip pipelined mergesort yields significant speedup over classic parallel mergesort on SCC. The technique should bring improvement in power consumption and should be portable to other manycore, network-on-chip architectures such as Tilera's processors.

  • 103.
    Melot, Nicolas
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Keller, Jörg
    FernUniversität in Hagen, Germany.
    Improving Energy-Efficiency of Static Schedules by Core Consolidation and Switching Off Unused Cores2016In: Parallel Computing: On the Road to Exascale / [ed] Gerhard R. Joubert; Hugh Leather; Mark Parsons; Frans Peters; Mark Sawyer, IOS Press, 2016, p. 285-294Conference paper (Refereed)
    Abstract [en]

    We demonstrate how static, energy-efficient schedules for independent, parallelizable tasks on parallel machines can be improved by modeling idle power if the static power consumption of a core comprises a notable fraction of the core's total power, which more and more often is the case. The improvement is achieved by optimally packing cores when deciding about core allocation, mapping and DVFS for each task so that all unused cores can be switched off and overall energy usage is minimized. We evaluate our proposal with a benchmark suite of task collections, and compare the resulting schedules with an optimal scheduler that does however not take idle power and core switch-off into account. We find that we can reduce energy consumption by 66% for mostly sequential tasks on many cores and by up to 91% for a realistic multicore processor model.

  • 104.
    Mäkelä, Jari-Matti
    et al.
    University of Turku, Finland.
    Hansson, Erik
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Åkesson, Daniel
    Linköping University, Department of Computer and Information Science. Linköping University, The Institute of Technology.
    Forsell, Martti
    VTT Oulu, Finland.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Leppänen, Ville
    University of Turku, Finland.
    Design of the Language Replica for Hybrid PRAM-NUMA Many-core Architectures2012In: Parallel and Distributed Processing with Applications (ISPA), 2012, IEEE conference proceedings, 2012, p. 697-704Conference paper (Refereed)
    Abstract [en]

    Parallel programming is widely considered very demanding for an average programmer due to inherent asynchrony of underlying parallel architectures. In this paper we describe the main design principles and core features of Replica -- a parallel language aimed for high-level programming of a new paradigm of reconfigurable, scalable and powerful synchronous shared memory architectures that promise to make parallel programming radically easier with the help of strict memory consistency and deterministic synchronous execution of hardware threads and multi-operations.

  • 105.
    Shafiee Sarvestani, Amin
    et al.
    Linköping University, Department of Computer and Information Science. Linköping University, The Institute of Technology.
    Hansson, Erik
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Extensible Recognition of Algorithmic Patterns in DSP Programs for Automatic Parallelization2013In: International journal of parallel programming, ISSN 0885-7458, E-ISSN 1573-7640, Vol. 41, no 6, p. 806-824Article in journal (Refereed)
    Abstract [en]

    We introduce an extensible knowledge based tool for idiom (pattern) recognition in DSP (digital signal processing) programs. Our tool utilizes functionality provided by the Cetus compiler infrastructure for detecting certain computation patterns that frequently occur in DSP code. We focus on recognizing patterns for for-loops and statements in their bodies as these often are the performance critical constructs in DSP applications for which replacement by highly optimized, target-specific parallel algorithms will be most profitable. For better structuring and efficiency of pattern recognition, we classify patterns by different levels of complexity such that patterns in higher levels are defined in terms of lower level patterns. The tool works statically on the intermediate representation. For better extensibility and abstraction, most of the structural part of recognition rules is specified in XML form to separate the tool implementation from the pattern specifications. Information about detected patterns will later be used for optimized code generation by local algorithm replacement e.g. for the low-power high-throughput multicore DSP architecture ePUMA.

  • 106.
    Sjöström, Oskar
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Ko, Soon Heum
    Linköping University, National Supercomputer Centre (NSC).
    Dastgeer, Usman
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Li, Lu
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Portable Parallelization of the EDGE CFD Application for GPU-based Systems using the SkePU Skeleton Programming Library2016In: Parallel Computing: On the Road to Exascale / [ed] Gerhard R. Joubert; Hugh Leather; Mark Parsons; Frans Peters; Mark Sawyer, IOS Press, 2016, p. 135-144Conference paper (Refereed)
    Abstract [en]

    EDGE is a complex application for computational fluid dynamics used e.g. for aerodynamic simulations in avionics industry. In this work we present the portable, high-level parallelization of EDGE for execution on multicore CPU and GPU based systems by using the multi-backend skeleton programming library SkePU. We first expose the challenges of applying portable high-level parallelization to a complex scientific application for a heterogeneous (GPU-based) system using (SkePU) skeletons and discuss the encountered flexibility problems that usually do not show up in skeleton toy programs. We then identify and implement necessary improvements in SkePU to become applicable for applications containing computations on complex data structures and with irregular data access. In particular, we improve the MapArray skeleton and provide a new MultiVector container for operand data that can be used with unstructured grid data structures. Although there is no SkePU skeleton specifically dedicated to handling computations on unstructured grids and its data structures, we still obtain portable speedup of EDGE with both multicore CPU and GPU execution by using the improved MapArray skeleton of SkePU.

  • 107.
    Soudris, Dimitrios
    et al.
    Natl Tech Univ Athens, Greece.
    Papadopoulos, Lazaros
    Natl Tech Univ Athens, Greece.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Kehagias, Dionysios D.
    CERTH, Greece.
    Papadopoulos, Athanasios
    CERTH, Greece.
    Seferlis, Panos
    CERTH, Greece.
    Chatzigeorgiou, Alexander
    CERTH, Greece.
    Ampatzoglou, Apostolos
    CERTH, Greece.
    Thibault, Samuel
    Inria Bordeaux, France.
    Namyst, Raymond
    Inria Bordeaux, France.
    Pleiter, Dirk
    Forschungszentrum Julich, Germany.
    Gaydadjiev, Georgi
    Maxeler Technol Ltd, England.
    Becker, Tobias
    Maxeler Technol Ltd, England.
    Haefele, Matthieu
    Univ Paris Sud, France.
    EXA2PRO programming environment: Architecture and Applications2018In: 2018 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION (SAMOS XVIII), ASSOC COMPUTING MACHINERY , 2018, p. 202-209Conference paper (Refereed)
    Abstract [en]

    The EXA2PRO programming environment will integrate a set of tools and methodologies that will allow to systematically address many exascale computing challenges, including performance, performance portability, programmability, abstraction and reusability, fault tolerance and technical debt. The EXA2PRO tool-chain will enable the efficient deployment of applications in exascale computing systems, by integrating high-level software abstractions that offer performance portability and efficient exploitation of exascale systems heterogeneity, tools for efficient memory management, optimizations based on trade-offs between various metrics and fault-tolerance support. Hence, by addressing various aspects of productivity challenges, EXA2PRO is expected to have significant impact in the transition to exascale computing, as well as impact from the perspective of applications. The evaluation will be based on 4 applications from 4 different domains that will be deployed in JUELICH supercomputing center. The EXA2PRO will generate exploitable results in the form of a tool-chain that support diverse exascale heterogeneous supercomputing centers and concrete improvements in various exascale computing challenges.

  • 108.
    Thorarensen, Sebastian
    et al.
    Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Science & Engineering.
    Cuello, Rosandra
    Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Science & Engineering.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Li, Lu
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Barry, Brendan
    Movidius Ltd, Ireland.
    Efficient Execution of SkePU Skeleton Programs on the Low-power Multicore Processor Myriad22016In: 2016 24TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP), IEEE , 2016, p. 398-402Conference paper (Refereed)
    Abstract [en]

    SkePU is a state-of-the-art skeleton programming library for high-level portable programming and efficient execution on heterogeneous parallel computer systems, with a publically available implementation for general-purpose multicore CPU and multi-GPU systems. This paper presents the design, implementation and evaluation of a new back-end of the SkePU skeleton programming library for the new low-power multicore processor Myriad2 by Movidius Ltd. This enables seamless code portability of SkePU applications across both HPC and embedded (Myriad2) parallel computing systems, with decent performance, on these architecturally very diverse types of execution platforms.

  • 109.
    Torggler, Manfred
    et al.
    Fernuniv, Germany.
    Keller, Joerg
    Fernuniv, Germany.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Asymmetric Crown Scheduling2017In: 2017 25TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2017), IEEE , 2017, p. 421-425Conference paper (Refereed)
    Abstract [en]

    Streaming applications are often used for embedded and high-performance multi and manycore processors. Achieving high throughput without wasting energy can be achieved by static scheduling of parallelizable tasks with frequency scaling. We present asymmetric crown scheduling, which improves on the static crown scheduling approach by allowing flexible split ratios when subdividing processor groups. We formulate the scheduler as an integer linear program and evaluate it with synthetic task sets. The results demonstrate that a small number of split ratios improves energy efficiency of crown schedules by up to 12% with slightly higher scheduling time.

  • 110.
    Wesarg, Bert
    et al.
    Mathematik und Informatik Fernuniversität Hagen, Germany.
    Blaar, Holger
    Informatik Universität Halle, Germany.
    Keller, Jörg
    Mathematik und Informatik Fernuniversität Hagen, Germany.
    Kessler, Christoph
    Linköping University, The Institute of Technology. Linköping University, Department of Computer and Information Science, PELAB - Programming Environment Laboratory.
    Emulating a PRAM on a Parallel Computer.2007In: PARS-2007 21. PARS - Workshop, Hamburg, Germany, May 31-Jun 1, 2007. GI/ITG-Fachgruppe Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware PARS.,2007, GI Gesellschaft für Informatik e.V. , 2007Conference paper (Refereed)
    Abstract [en]

    The PRAM is an important model to study parallel algorithmics, yet this should be supported by the possibility for implementation and experimentation. With the advent of of multicore systems, shared memory programming also has regained importance for applications in practice. For these reasons, a powerful experimental platform should be available. While the language Fork with its development kit allows implementation, the sequential simulator restricts experiments. We develop a simulator for Fork programs on a parallel machine. We report on obstacles and present speedup results of a prototype.

  • 111.
    Ålind, Markus
    et al.
    Linköpings universitet.
    Eriksson, Mattias
    Linköping University, The Institute of Technology. Linköping University, Department of Computer and Information Science, PELAB - Programming Environment Laboratory.
    Kessler, Christoph
    Linköping University, The Institute of Technology. Linköping University, Department of Computer and Information Science, PELAB - Programming Environment Laboratory.
    BlockLib: A Skeleton Library for Cell Broadband Engine2008In: Proceedings - International Conference on Software Engineering, New York, USA: ACM , 2008, p. 7-14Conference paper (Refereed)
    Abstract [en]

    Cell Broadband Engine is a heterogeneous multicore processor for high-performance computing and gaming. Its architecture allows for an impressive peak performance but, at the same time, makes it very hard to write efficient code. The need to simultaneously exploit SIMD instructions, coordinate parallel execution of the slave processors, overlap DMA memory traffic with computation, keep data properly aligned in memory, and explicitly manage the very small on-chip memory buffers of the slave processors, leads to very complex code. In this work, we adopt the skeleton programming approach to abstract from much of the complexity of Cell programming while maintaining high performance. The abstraction is achieved through a library of parallel generic building blocks, called BlockLib. Macro-based generative programming is used to reduce the overhead of genericity in skeleton functions and control code size expansion. We demonstrate the library usage with a parallel ODE solver application. Our experimental results show that BlockLib code achieves performance close to hand-written code and even outperforms the native IBM BLAS library in cases where several slave processors are used.

123 101 - 111 of 111
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf