liu.seSearch for publications in DiVA
Change search
Refine search result
1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Avdic, Kenan
    et al.
    Linköping University, Department of Computer and Information Science. Linköping University, The Institute of Technology.
    Melot, Nicolas
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Keller, Jörg
    FernUniversität in Hagen.
    Pipelined parallel sorting on the Intel SCC2011In: Fourth Swedish Workshop on Multi-Core Computing MCC-2011: November 23-25, 2011, Linköping University, Linköping, Sweden / [ed] Christoph Kessler, Linköping: Linköping University , 2011, Vol. S. 96-101, 96-101 p.Conference paper (Other academic)
    Abstract [en]

    The Single-Chip Cloud Computer (SCC) is an experimental processor created by Intel Labs. It comprises 48 Intel-IA32 cores linked by an on-chip high performance mesh network, as well as four DDR3 memory controllers to access an off-chip main memory. We investigate the adaptation of sorting onto SCC as an algorithm engineering problem. We argue that a combination of pipelined mergesort and sample sort will fit best to SCC's architecture. We also provide a mapping based on integer linear programming to address load balancing and latency considerations. We describe a prototype implementation of our proposai together with preliminary runtime measurements, that indicate the usefulness of this approach. As mergesort can be considered as a representative of the class of streaming applications, the techniques deveioped here should also apply to the other problems in this class, such as many applications for parallel embedded systems, i.e. MPSoC. 

  • 2.
    Kessler, Christoph
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Melot, Nicolas
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Eitschberger, Patrick
    FernUniversität in Hagen, Germany.
    Keller, Jörg
    FernUniversität in Hagen, Germany.
    Crown Scheduling: Energy-Efficient Resource Allocation, Mapping and Discrete Frequency Scaling for Collections of Malleable Streaming Tasks2013In: Proceedings of the 23rd International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2013 / [ed] Jörg Henkel and Alex Yakovlev (eds.), IEEE Computer Society Digital Library, 2013, 215-222 p.Conference paper (Refereed)
    Abstract [en]

    We investigate the problem of generating energy-optimal code for a collection of streaming tasks that include parallelizable or malleable tasks on a generic many-core processor with dynamic discrete frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin at user level in a data driven way. A stream of data flows through the tasks and intermediate results are forwarded to other tasks like in a pipelined task graph. In this paper we present crown scheduling, a novel technique for the combined optimization of resource allocation, mapping and discrete voltage/frequency scaling for malleable streaming task sets in order to optimize energy efficiency given a throughput constraint. We present optimal off-line algorithms for separate and integrated crown scheduling based on integer linear programming (ILP). We also propose extensions for dynamic rescaling to automatically adapt a given crown schedule in situations where not all tasks are data ready. Our energy model considers both static idle power and dynamic power consumption of the processor cores. Our experimental evaluation of the ILP models for a generic manycore architecture shows that at least for small and medium sized task sets even the integrated variant of crown scheduling can be solved to optimality by a state-of-the-art ILP solver within a few seconds.

  • 3.
    Melot, Nicolas
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Algorithms and Framework for Energy Efficient Parallel Stream Computing on Many-Core Architectures2016Doctoral thesis, monograph (Other academic)
    Abstract [en]

    The rise of many-core processor architectures in the market answers to a constantly growing need of processing power to solve more and more challenging problems such as the ones in computing for big data. Fast computation is more and more limited by the very high power required and the management of the considerable heat produced. Many programming models compete to take profit of many-core architectures to improve both execution speed and energy consumption, each with their advantages and drawbacks. The work described in this thesis is based on the dataflow computing approach and investigates the benefits of a carefully pipelined execution of streaming applications, focusing in particular on off- and on-chip memory accesses. As case study, we implement classic and on-chip pipelined versions of mergesort for Intel SCC and Xeon. We see how the benefits of the on-chip pipelining technique are bounded by the underlying architecture, and we explore the problem of fine tuning streaming applications for many-core architectures to optimize for energy given a throughput budget. We propose a novel methodology to compute schedules optimized for energy efficiency given a fixed throughput target. We introduce \emph{Drake}, derived from Schedeval, a tool that generates pipelined applications for Many-Core architectures and allows the performance testing in time or energy of their static schedule. We show that streaming applications based on Drake compete with specialized implementations and we use Schedeval to demonstrate performance differences between schedules that are otherwise considered as equivalent by a simple model.

  • 4.
    Melot, Nicolas
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Energy-Efficient Computing over Streams with Massively Parallel Architectures2015Licentiate thesis, monograph (Other academic)
    Abstract [en]

    The rise of many-core processor architectures in the high-performance computing market answers to a constantly growing need of processing power to solve more and more challenging problems such as the ones in computing for big data. Fast computation is more and more limited by the very high power required and the management of the considerable heat produced. Many programming models compete to take prot of many-core architectures to improve both execution speed and energy consumption, each with their advantages and drawbacks. The work described in this thesis is based on the dataflow computing approach and investigates the benets of a carefully designed pipelined execution of streaming applications, focusing on particular on off- and on-chip memory accesses. We implement classic and on-chip pipelined versions of mergesort for the SCC. We see how the benets of the on-chip pipelining technique are bounded by the underlying architecture, and we explore the problem of ne tuning streaming applications for manycore architectures to optimize for energy given a throughput budget. We propose a novel methodology to compute schedules optimized for energy eciency for a fixed throughput target. We introduce Schedeval, a tool to test schedules of communicating streaming tasks under throughput constraints for the SCC. We show  that streaming applications based on Schedeval compete with specialized implementations and we use Schedeval to demonstrate performance dierences between schedules that are otherwise considered as equivalent by a simple model.

  • 5.
    Melot, Nicolas
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Avdic, Kenan
    Linköping University, Department of Computer and Information Science. Linköping University, The Institute of Technology.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Keller, J.
    Fern Universität in Hagen, Fac. of Math. and Computer Science, 58084 Hagen, Germany.
    Investigation of main memory bandwidth on intel single-chip cloud computer2011In: 3rd Many-Core Applications Research Community Symposium, MARC 2011, 2011, 107-110 p.Conference paper (Refereed)
    Abstract [en]

    The Single-Chip Cloud Computer (SCC) is an experimental processor created by Intel Labs. It comprises 48 x86 cores linked by an on-chip high performance network, as well as four DDR3 memory controllers to access an off-chip main memory of up to 64GiB. This work evaluates the performance of the SCC when accessing the off-chip memory. The focus of this study is not on taxing the bare hardware. Instead, we are interested in the performance of applications that run on the Linux operating system and use the SCC as it is provided. We see that the per-core read memory bandwidth is largely independent of the number of cores accessing the memory simultaneously, but that the write memory access performance drops when more cores write simultaneously to the memory. In addition, the global and per-core memory bandwidth, both writing and reading, depends strongly on the memory access pattern.

  • 6.
    Melot, Nicolas
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Avdic, Kenan
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Keller, Jörg
    FernUniversitat in Hagen, Fac. of Math. and Computer Science, Hagen, Germany.
    Memory-intensive parallel computing on the Single Chip Cloud Computer: A case study with Mergesort2011Conference paper (Refereed)
    Abstract [en]

    The Single Chip Cloud computer (SCC) is an experimental processor from Intel Labs with 48 cores connected with a 2D mesh on-chip network. We evaluate the performance of SCC regarding off-chip memory accesses and communication capabilities. As benchmark, we use the merging phase of mergesort, a representative of a memory access intensive algorithm. Mergesort is parallelized and adapted in 4 variants, each leveraging different features of the SCC, in order to assess and compare their performance impact. Our results motivate to consider on-chip pipelined mergesort on SCC, which is an issue of ongoing work.

  • 7.
    Melot, Nicolas
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Avdic, Kenan
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Keller, Jörg
    Fern Universität in Hagen, Fac. of Math. and Computer Science, Hagen, Germany.
    Parallel sorting on Intel Single-Chip Cloud computer2011In: 3rd Many-core Applications ResearchCommunity (MARC) Symposium / [ed] Diana Göhringer, Michael Hübner and Jürgen Becker, Karlsruhe: KIT Scientific Publishing , 2011, , 11 p.107-110 p.Conference paper (Refereed)
    Abstract [en]

    The Single-Chip Cloud Computer (SCC) is an experimental processor created by Intel Labs. It comprises 48 x86 cores linked by an on-chip high performance network, as well as four DDR3 memory controllers to access an off-chip main memory of up to 64GiB. This work evaluates the performance of the SCC when accessing the off-chip memory. The focus of this study is not on taxing the bare hardware. Instead, we are interested in the performance of applications that run on the Linux operating system and use the SCC as it is provided. We see that the per-core read memory bandwidth is largely independent of the number of cores accessing the memory simultaneously, but that the write memory access performance drops when more cores write simultaneously to the memory. In addition, the global and per-core memory bandwidth, both writing and reading, depends strongly on the memory access pattern.

  • 8.
    Melot, Nicolas
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Janzen, Johan
    Uppsala University, Sweden.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Mimer and Schedeval: Tools for Comparing Static Schedulers for Streaming Applications on Manycore Architectures2015In: 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, IEEE , 2015, 146-155 p.Conference paper (Refereed)
    Abstract [en]

    Scheduling algorithms published in the scientific literature are often difficult to evaluate or compare due to differences between the experimental evaluations in any two papers on the topic. Very few researchers share the details about the scheduling problem instances they use in their evaluation section, the code that allows them to transform the numbers they collect into the results and graphs they show, nor the raw data produced in their experiments. Also, many scheduling algorithms published are not tested against a real processor architecture to evaluate their efficiency in a realistic setting. In this paper, we describe Mimer, a modular evaluation tool-chain for static schedulers that enables the sharing of evaluation and analysis tools employed to elaborate scheduling papers. We propose Schedeval that integrates into Mimer to evaluate static schedules of streaming applications under throughput constraints on actual target execution platforms. We evaluate the performance of Schedeval at running streaming applications on the Intel Single-Chip Cloud computer (SCC), and we demonstrate the usefulness of our tool-chain to compare existing scheduling algorithms. We conclude that Mimer and Schedeval are useful tools to study static scheduling and to observe the behavior of streaming applications when running on manycore architectures.

  • 9.
    Melot, Nicolas
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Avdic, Kenan
    Linköping University, Department of Computer and Information Science. Linköping University, The Institute of Technology.
    Cichowski, Patrick
    Linköping University, Department of Computer and Information Science. Linköping University, The Institute of Technology.
    Keller, Jorg
    Linköping University, Department of Computer and Information Science. Linköping University, The Institute of Technology.
    Engineering parallel sorting for the Intel SCC2012In: Procedia Computer Science, ISSN 1877-0509, E-ISSN 1877-0509, Vol. 9, 1890-1899 p.Article in journal (Refereed)
    Abstract [en]

    The Single-Chip Cloud Computer (SCC) is an experimental processor created by Intel Labs. It comprises 48 Intel-x86 cores linked by an on-chip high performance mesh network, as well as four DDR3 memory controllers to access an off-chip main memory. We investigate the adaptation of sorting onto SCC as an algorithm engineering problem. We argue that a combination of pipelined mergesort and sample sort will fit best to SCCs architecture. We also provide a mapping based on integer linear programming to address load balancing and latency considerations. We describe a prototype implementation of our proposal together with preliminary runtime measurements, that indicate the usefulness of this approach. As mergesort can be considered as a representative of the class of streaming applications, the techniques developed here should also apply to the other problems in this class, such as many applications for parallel embedded systems, i.e. MPSoC.

  • 10.
    Melot, Nicolas
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Keller, Joerg
    FernUniversität in Hagen, Germany (Parallelität und VLSI).
    Eitschberger, Patrick
    FernUniversität in Hagen, Germany (Parallelität und VLSI).
    Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Many-Core Systems2015In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, Vol. 11, no 4, 62- p.Article in journal (Refereed)
    Abstract [en]

    Exploiting effectively massively parallel architectures is a major challenge that stream programming can help facilitate. We investigate the problem of generating energy-optimal code for a collection of streaming tasks that include parallelizable or moldable tasks on a generic manycore processor with dynamic discrete frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin at user level in a data-driven way. A stream of data flows through the tasks and intermediate results may be forwarded to other tasks, as in a pipelined task graph. In this article, we consider crown scheduling, a novel technique for the combined optimization of resource allocation, mapping, and discrete voltage/frequency scaling for moldable streaming task collections in order to optimize energy efficiency given a throughput constraint. We first present optimal offline algorithms for separate and integrated crown scheduling based on integer linear programming (ILP). We make no restricting assumption about speedup behavior. We introduce the fast heuristic Longest Task, Lowest Group (LTLG) as a generalization of the Longest Processing Time (LPT) algorithm to achieve a load-balanced mapping of parallel tasks, and the Height heuristic for crown frequency scaling. We use them in feedback loop heuristics based on binary search and simulated annealing to optimize crown allocation.

    Our experimental evaluation of the ILP models for a generic manycore architecture shows that at least for small and medium-sized streaming task collections even the integrated variant of crown scheduling can be solved to optimality by a state-of-the-art ILP solver within a few seconds. Our heuristics produce makespan and energy consumption close to optimality within the limits of the phase-separated crown scheduling technique and the crown structure. Their optimization time is longer than the one of other algorithms we test, but our heuristics consistently produce better solutions.

  • 11.
    Melot, Nicolas
    et al.
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Kessler, Christoph
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology.
    Keller, Jörg
    Fern Universitat in Hagen, Fac. of Math. and Computer Science, Hagen, Germany.
    Efficient On-Chip Pipelined Streaming Computations on Scalable Manycore Architectures2012Conference paper (Other academic)
    Abstract [en]

    Performance of manycore processors is limited by programs' use of off-chip main memory. Streaming computation organized in a pipeline limits accesses to main memory to tasks at boundaries of the pipeline to read or write to main memory. The Single Chip Cloud computer (SCC) offers 48 cores linked by a high-speed on-chip network, and allows the implementation of such on-chip pipelined technique. We assess the performance and constraints provided by the SCC and investigate on on-chip pipelined mergesort as a case study for streaming computations. We found that our on-chip pipelined mergesort yields significant speedup over classic parallel mergesort on SCC. The technique should bring improvement in power consumption and should be portable to other manycore, network-on-chip architectures such as Tilera's processors.

1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf