liu.seSök publikationer i DiVA
Ändra sökning
Avgränsa sökresultatet
1 - 7 av 7
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Aragon, Elena
    et al.
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Jimenez, Juan M.
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Maghazeh, Arian
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Rasmusson, Jim
    Sony Mobile Communications, Sweden.
    Bordoloi, Unmesh D.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Pattern matching in OpenCL: GPU vs CPU energy consumption on two mobile chipsets2014Ingår i: Proceedings of the International Workshop / OpenCL 2013 & 2014 (IWOCL '14), ACM Digital Library, 2014, s. Article No. 5-Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Adaptations of the Aho-Corasick (AC) algorithm on high performance graphics processors (also called GPUs) have garnered increasing attention in recent years. However, no results have been reported regarding their implementations on mobile GPUs. In this paper, we show that implementing a state-of-the-art Aho-Corasick parallel algorithm on a mobile GPU delivers significant speedups. We study a few implementation optimizations some of which may seem counter-intuitive to standard optimizations for high-end GPUs. More importantly, we focus on measuring the energy consumed by different components of the OpenCL application rather than reporting the aggregate. We show that there are considerable energy savings compared to the CPU implementation of the AC algorithm.

  • 2. Beställ onlineKöp publikationen >>
    Maghazeh, Arian
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    System-Level Design of GPU-Based Embedded Systems2018Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    Modern embedded systems deploy several hardware accelerators, in a heterogeneous manner, to deliver high-performance computing. Among such devices, graphics processing units (GPUs) have earned a prominent position by virtue of their immense computing power. However, a system design that relies on sheer throughput of GPUs is often incapable of satisfying the strict power- and time-related constraints faced by the embedded systems.

    This thesis presents several system-level software techniques to optimize the design of GPU-based embedded systems under various graphics and non-graphics applications. As compared to the conventional application-level optimizations, the system-wide view of our proposed techniques brings about several advantages: First, it allows for fully incorporating the limitations and requirements of the various system parts in the design process. Second, it can unveil optimization opportunities through exposing the information flow between the processing components. Third, the techniques are generally applicable to a wide range of applications with similar characteristics. In addition, multiple system-level techniques can be combined together or with application-level techniques to further improve the performance.

    We begin by studying some of the unique attributes of GPU-based embedded systems and discussing several factors that distinguish the design of these systems from that of the conventional high-end GPU-based systems. We then proceed to develop two techniques that address an important challenge in the design of GPU-based embedded systems from different perspectives. The challenge arises from the fact that GPUs require a large amount of workload to be present at runtime in order to deliver a high throughput. However, for some embedded applications, collecting large batches of input data requires an unacceptable waiting time, prompting a trade-off between throughput and latency. We also develop an optimization technique for GPU-based applications to address the memory bottleneck issue by utilizing the GPU L2 cache to shorten data access time. Moreover, in the area of graphics applications, and in particular with a focus on mobile games, we propose a power management scheme to reduce the GPU power consumption by dynamically adjusting the display resolution, while considering the user's visual perception at various resolutions. We also discuss the collective impact of the proposed techniques in tackling the design challenges of emerging complex systems.

    The proposed techniques are assessed by real-life experimentations on GPU-based hardware platforms, which demonstrate the superior performance of our approaches as compared to the state-of-the-art techniques.

    Delarbeten
    1. General Purpose Computing on Low-Power Embedded GPUs: Has It Come of Age?
    Öppna denna publikation i ny flik eller fönster >>General Purpose Computing on Low-Power Embedded GPUs: Has It Come of Age?
    2013 (Engelska)Ingår i: 13th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS 2013), Samos, Greece, July 15-18, 2013., IEEE Press, 2013Konferensbidrag, Publicerat paper (Refereegranskat)
    Abstract [en]

    In this paper we evaluate the promise held by low power GPUs for non-graphic workloads that arise in embedded systems. Towards this, we map and implement 5 benchmarks, that find utility in very different application domains, to an embedded GPU. Our results show that apart from accelerated performance, embedded GPUs are promising also because of their energy efficiency which is an important design goal for battery-driven mobile devices. We show that adopting the same optimization strategies as those used for programming high-end GPUs might lead to worse performance on embedded GPUs. This is due to restricted features of embedded GPUs, such as, limited or no user-defined memory, small instruction-set, limited number of registers, among others. We propose techniques to overcome such challenges, e.g., by distributing the workload between GPUs and multi-core CPUs, similar to the spirit of heterogeneous computation.

    Ort, förlag, år, upplaga, sidor
    IEEE Press, 2013
    Nationell ämneskategori
    Datavetenskap (datalogi)
    Identifikatorer
    urn:nbn:se:liu:diva-92626 (URN)10.1109/SAMOS.2013.6621099 (DOI)000332458100004 ()
    Konferens
    SAMOS'13
    Tillgänglig från: 2013-05-14 Skapad: 2013-05-14 Senast uppdaterad: 2018-12-07
    2. Saving Energy without Defying Deadlines on Mobile GPU-based Heterogeneous Systems
    Öppna denna publikation i ny flik eller fönster >>Saving Energy without Defying Deadlines on Mobile GPU-based Heterogeneous Systems
    Visa övriga...
    2014 (Engelska)Ingår i: 2014 International Conference on Hardware/Software Codesign and System Synthesis, Association for Computing Machinery (ACM), 2014Konferensbidrag, Publicerat paper (Refereegranskat)
    Abstract [en]

    With the advent of low-power programmable compute cores based on GPUs, GPU-equipped heterogeneous platforms are becoming common in a wide spectrum of industries including safety-critical domains like the automotive industry. While the suitability of GPUs for throughput oriented applications is well-accepted, their applicability for real-time applications remains an open issue. Moreover, in mobile/embedded systems, energy-efficient computing is a major concern and yet, there has been no systematic study on the energy savings that GPUs may potentially provide. In this paper, we propose an approach to utilize both the GPU and the CPU in a heterogeneous fashion to meet the deadlines of a real-time application while ensuring that we maximize the energy savings. We note that GPUs are inherently built to maximize the throughput and this poses a major challenge when deadlines must be satisfied. The problem becomes more acute when we consider the fact that GPUs are more energy efficient than CPUs and thus, a naive approach that is based on maximizing GPU utilization might easily lead to infeasible solutions from a deadline perspective.

    Ort, förlag, år, upplaga, sidor
    Association for Computing Machinery (ACM), 2014
    Nationell ämneskategori
    Data- och informationsvetenskap
    Identifikatorer
    urn:nbn:se:liu:diva-112689 (URN)10.1145/2656075.2656097 (DOI)978-1-4503-3051-0 (ISBN)
    Konferens
    International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS 2014), New Delhi, India, October 12-17, 2014
    Tillgänglig från: 2014-12-08 Skapad: 2014-12-08 Senast uppdaterad: 2018-12-07Bibliografiskt granskad
    3. Perception-aware power management for mobile games via dynamic resolution scaling
    Öppna denna publikation i ny flik eller fönster >>Perception-aware power management for mobile games via dynamic resolution scaling
    Visa övriga...
    2015 (Engelska)Ingår i: 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), IEEE , 2015, s. 613-620Konferensbidrag, Publicerat paper (Refereegranskat)
    Abstract [en]

    Modern mobile devices provide ultra-high resolutions in their display panels. This imposes ever increasing workload on the GPU leading to high power consumption and shortened battery life. In this paper, we first show that resolution scaling leads to significant power savings. Second, we propose a perception-aware adaptive scheme that sets the resolution during game play. We exploit the fact that game players are often willing to trade quality for longer battery life. Our scheme uses decision theory, where the predicted user perception is combined with a novel asymmetric loss function that encodes users' alterations in their willingness to save power.

    Ort, förlag, år, upplaga, sidor
    IEEE, 2015
    Serie
    ICCAD-IEEE ACM International Conference on Computer-Aided Design, ISSN 1933-7760
    Nationell ämneskategori
    Datavetenskap (datalogi)
    Identifikatorer
    urn:nbn:se:liu:diva-124543 (URN)10.1109/ICCAD.2015.7372626 (DOI)000368929600084 ()978-1-4673-8388-2 (ISBN)
    Konferens
    Computer-Aided Design (ICCAD), 2015 IEEE/ACM International Conference on 2-6 Nov. 2015 Austin, TX
    Tillgänglig från: 2016-02-02 Skapad: 2016-02-02 Senast uppdaterad: 2018-12-07
    4. Latency-Aware Packet Processing on CPU-GPU Heterogeneous Systems
    Öppna denna publikation i ny flik eller fönster >>Latency-Aware Packet Processing on CPU-GPU Heterogeneous Systems
    Visa övriga...
    2017 (Engelska)Ingår i: DAC '17 Proceedings of the 54th Annual Design Automation Conference 2017, New York, NY, USA: Association for Computing Machinery (ACM), 2017Konferensbidrag, Publicerat paper (Refereegranskat)
    Abstract [en]

    In response to the tremendous growth of the Internet, towards what we call the Internet of Things (IoT), there is a need to move from costly, high-time-to-market specific-purpose hardware to flexible, low-time-to-market general-purpose devices for packet processing. Among several such devices, GPUs have attracted attention in the past, mainly because the high computing demand of packet processing applications can, potentially, be satisfied by these throughput-oriented machines. However, another important aspect of such applications is the packet latency which, if not handled carefully, will overshadow the throughput benefits. Unfortunately, until now, this aspect has been mostly ignored. To address this issue, we propose a method that considers the variable bit rate of the traffic and, depending on the current rate, minimizes the latency, while meeting the rate demand. We propose a persistent kernel based software architecture to overcome the challenges inherent in GPU implementation like kernel invocation overhead, CPU-GPU communication and memory access overhead. We have chosen packet classification as the packet processing application to demonstrate our technique. Using the proposed approach, we are able to reduce the packet latency on average by a factor of 3.5, compared to the state-of-the-art solutions, without any packet drop.

    Ort, förlag, år, upplaga, sidor
    New York, NY, USA: Association for Computing Machinery (ACM), 2017
    Serie
    Design Automation Conference DAC, ISSN 0738-100X
    Nationell ämneskategori
    Datavetenskap (datalogi)
    Identifikatorer
    urn:nbn:se:liu:diva-141212 (URN)10.1145/3061639.3062269 (DOI)000424895400129 ()2-s2.0-85023612665 (Scopus ID)978-1-4503-4927-7 (ISBN)
    Konferens
    54th ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, USA, June 18-22, 2017
    Tillgänglig från: 2017-09-27 Skapad: 2017-09-27 Senast uppdaterad: 2018-12-07Bibliografiskt granskad
  • 3.
    Maghazeh, Arian
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Bordoloi, Unmesh D.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Dastgeer, Usman
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten. Ericsson Sweden.
    Andrei, Alexandru
    Ericsson Sweden.
    Eles, Petru
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Peng, Zebo
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Latency-Aware Packet Processing on CPU-GPU Heterogeneous Systems2017Ingår i: DAC '17 Proceedings of the 54th Annual Design Automation Conference 2017, New York, NY, USA: Association for Computing Machinery (ACM), 2017Konferensbidrag (Refereegranskat)
    Abstract [en]

    In response to the tremendous growth of the Internet, towards what we call the Internet of Things (IoT), there is a need to move from costly, high-time-to-market specific-purpose hardware to flexible, low-time-to-market general-purpose devices for packet processing. Among several such devices, GPUs have attracted attention in the past, mainly because the high computing demand of packet processing applications can, potentially, be satisfied by these throughput-oriented machines. However, another important aspect of such applications is the packet latency which, if not handled carefully, will overshadow the throughput benefits. Unfortunately, until now, this aspect has been mostly ignored. To address this issue, we propose a method that considers the variable bit rate of the traffic and, depending on the current rate, minimizes the latency, while meeting the rate demand. We propose a persistent kernel based software architecture to overcome the challenges inherent in GPU implementation like kernel invocation overhead, CPU-GPU communication and memory access overhead. We have chosen packet classification as the packet processing application to demonstrate our technique. Using the proposed approach, we are able to reduce the packet latency on average by a factor of 3.5, compared to the state-of-the-art solutions, without any packet drop.

  • 4.
    Maghazeh, Arian
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Bordoloi, Unmesh D.
    Linköpings universitet, Institutionen för datavetenskap, ESLAB - Laboratoriet för inbyggda system. Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Eles, Petru
    Linköpings universitet, Institutionen för datavetenskap, ESLAB - Laboratoriet för inbyggda system. Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Peng, Zebo
    Linköpings universitet, Institutionen för datavetenskap, ESLAB - Laboratoriet för inbyggda system. Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    General Purpose Computing on Low-Power Embedded GPUs: Has It Come of Age?2013Ingår i: 13th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS 2013), Samos, Greece, July 15-18, 2013., IEEE Press, 2013Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper we evaluate the promise held by low power GPUs for non-graphic workloads that arise in embedded systems. Towards this, we map and implement 5 benchmarks, that find utility in very different application domains, to an embedded GPU. Our results show that apart from accelerated performance, embedded GPUs are promising also because of their energy efficiency which is an important design goal for battery-driven mobile devices. We show that adopting the same optimization strategies as those used for programming high-end GPUs might lead to worse performance on embedded GPUs. This is due to restricted features of embedded GPUs, such as, limited or no user-defined memory, small instruction-set, limited number of registers, among others. We propose techniques to overcome such challenges, e.g., by distributing the workload between GPUs and multi-core CPUs, similar to the spirit of heterogeneous computation.

  • 5.
    Maghazeh, Arian
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Bordoloi, Unmesh D.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Horga, Adrian
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Eles, Petru
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Peng, Zebo
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska högskolan.
    Saving Energy without Defying Deadlines on Mobile GPU-based Heterogeneous Systems2014Ingår i: 2014 International Conference on Hardware/Software Codesign and System Synthesis, Association for Computing Machinery (ACM), 2014Konferensbidrag (Refereegranskat)
    Abstract [en]

    With the advent of low-power programmable compute cores based on GPUs, GPU-equipped heterogeneous platforms are becoming common in a wide spectrum of industries including safety-critical domains like the automotive industry. While the suitability of GPUs for throughput oriented applications is well-accepted, their applicability for real-time applications remains an open issue. Moreover, in mobile/embedded systems, energy-efficient computing is a major concern and yet, there has been no systematic study on the energy savings that GPUs may potentially provide. In this paper, we propose an approach to utilize both the GPU and the CPU in a heterogeneous fashion to meet the deadlines of a real-time application while ensuring that we maximize the energy savings. We note that GPUs are inherently built to maximize the throughput and this poses a major challenge when deadlines must be satisfied. The problem becomes more acute when we consider the fact that GPUs are more energy efficient than CPUs and thus, a naive approach that is based on maximizing GPU utilization might easily lead to infeasible solutions from a deadline perspective.

  • 6.
    Maghazeh, Arian
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Bordoloi, Unmesh D.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Villani, Mattias
    Linköpings universitet, Institutionen för datavetenskap, Statistik. Linköpings universitet, Tekniska fakulteten.
    Eles, Petru
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Peng, Zebo
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Perception-aware power management for mobile games via dynamic resolution scaling2015Ingår i: 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), IEEE , 2015, s. 613-620Konferensbidrag (Refereegranskat)
    Abstract [en]

    Modern mobile devices provide ultra-high resolutions in their display panels. This imposes ever increasing workload on the GPU leading to high power consumption and shortened battery life. In this paper, we first show that resolution scaling leads to significant power savings. Second, we propose a perception-aware adaptive scheme that sets the resolution during game play. We exploit the fact that game players are often willing to trade quality for longer battery life. Our scheme uses decision theory, where the predicted user perception is combined with a novel asymmetric loss function that encodes users' alterations in their willingness to save power.

  • 7.
    Maghazeh, Arian
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Chattopadhyay, Sudipta
    Singapore University of Technology and Design (SUTD), Information Systems Technology and Design (ISTD).
    Eles, Petru
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Peng, Zebo
    Linköpings universitet, Institutionen för datavetenskap, Programvara och system. Linköpings universitet, Tekniska fakulteten.
    Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications2019Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present a software approach to address the data latency issue for certain GPU applications. Each application is modeled as a kernel graph, where the nodes represent individual GPU kernels and the edges capture data dependencies. Our technique exploits the GPU L2 cache to accelerate parameter passing between the kernels. The key idea is that, instead of having each kernel process the entire input in one invocation, we subdivide the input into fragments (which fit in the cache) and, ideally, process each fragment in one continuous sequence of kernel invocations. Our proposed technique is oblivious to kernel functionalities and requires minimal source code modification. We demonstrate our technique on a full-fledged image processing application and improve the performance on average by 30% over various settings.

1 - 7 av 7
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf