Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Software caching techniques and hardware optimizations for on-chip local memories

View through CrossRef
Despite the fact that the most viable L1 memories in processors are caches, on-chip local memories have been a great topic of consideration lately. Local memories are an interesting design option due to their many benefits: less area occupancy, reduced energy consumption and fast and constant access time. These benefits are especially interesting for the design of modern multicore processors since power and latency are important assets in computer architecture today. Also, local memories do not generate coherency traffic which is important for the scalability of the multicore systems. Unfortunately, local memories have not been well accepted in modern processors yet, mainly due to their poor programmability. Systems with on-chip local memories do not have hardware support for transparent data transfers between local and global memories, and thus ease of programming is one of the main impediments for the broad acceptance of those systems. This thesis addresses software and hardware optimizations regarding the programmability, and the usage of the on-chip local memories in the context of both single-core and multicore systems. Software optimizations are related to the software caching techniques. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this thesis, we start optimizing traditional software cache by proposing a hierarchical, hybrid software-cache architecture. Afterwards, we develop few optimizations in order to speedup our hybrid software cache as much as possible. As the result of the software optimizations we obtain that our hybrid software cache performs from 4 to 10 times faster than traditional software cache on a set of NAS parallel benchmarks. We do not stop with software caching. We cover some other aspects of the architectures with on-chip local memories, such as the quality of the generated code and its correspondence with the quality of the buffer management in local memories, in order to improve performance of these architectures. Therefore, we run our research till we reach the limit in software and start proposing optimizations on the hardware level. Two hardware proposals are presented in this thesis. One is about relaxing alignment constraints imposed in the architectures with on-chip local memories and the other proposal is about accelerating the management of local memories by providing hardware support for the majority of actions performed in our software cache. Malgrat les memòries cau encara son el component basic pel disseny del subsistema de memòria, les memòries locals han esdevingut una alternativa degut a les seves característiques pel que fa a l’ocupació d’àrea, el seu consum energètic i el seu rendiment amb un temps d’accés ràpid i constant. Aquestes característiques son d’especial interès quan les properes arquitectures multi-nucli estan limitades pel consum de potencia i la latència del subsistema de memòria.Les memòries locals pateixen de limitacions respecte la complexitat en la seva programació, fet que dificulta la seva introducció en arquitectures multi-nucli, tot i els avantatges esmentats anteriorment. Aquesta tesi presenta un seguit de solucions basades en programari i maquinari específicament dissenyat per resoldre aquestes limitacions.Les optimitzacions del programari estan basades amb tècniques d'emmagatzematge de memòria cau suportades per llibreries especifiques. La memòria cau per programari és un sòlid mètode per proporcionar a l'usuari una visió transparent de l'arquitectura, però aquest enfocament pot patir d'un rendiment deficient. En aquesta tesi, es proposa una estructura jeràrquica i híbrida. Posteriorment, desenvolupem optimitzacions per tal d'accelerar l’execució del programari que suporta el disseny de la memòria cau. Com a resultat de les optimitzacions realitzades, obtenim que el nostre disseny híbrid es comporta de 4 a 10 vegades més ràpid que una implementació tradicional de memòria cau sobre un conjunt d’aplicacions de referencia, com son els “NAS parallel benchmarks”.El treball de tesi inclou altres aspectes de les arquitectures amb memòries locals, com ara la qualitat del codi generat i la seva correspondència amb la qualitat de la gestió de memòria intermèdia en les memòries locals, per tal de millorar el rendiment d'aquestes arquitectures. La tesi desenvolupa propostes basades estrictament en el disseny de nou maquinari per tal de millorar el rendiment de les memòries locals quan ja no es possible realitzar mes optimitzacions en el programari. En particular, la tesi presenta dues propostes de maquinari: una relaxa les restriccions imposades per les memòries locals respecte l’alineament de dades, l’altra introdueix maquinari específic per accelerar les operacions mes usuals sobre les memòries locals.
Universitat Politècnica de Catalunya
Title: Software caching techniques and hardware optimizations for on-chip local memories
Description:
Despite the fact that the most viable L1 memories in processors are caches, on-chip local memories have been a great topic of consideration lately.
Local memories are an interesting design option due to their many benefits: less area occupancy, reduced energy consumption and fast and constant access time.
These benefits are especially interesting for the design of modern multicore processors since power and latency are important assets in computer architecture today.
Also, local memories do not generate coherency traffic which is important for the scalability of the multicore systems.
Unfortunately, local memories have not been well accepted in modern processors yet, mainly due to their poor programmability.
Systems with on-chip local memories do not have hardware support for transparent data transfers between local and global memories, and thus ease of programming is one of the main impediments for the broad acceptance of those systems.
This thesis addresses software and hardware optimizations regarding the programmability, and the usage of the on-chip local memories in the context of both single-core and multicore systems.
Software optimizations are related to the software caching techniques.
Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance.
In this thesis, we start optimizing traditional software cache by proposing a hierarchical, hybrid software-cache architecture.
Afterwards, we develop few optimizations in order to speedup our hybrid software cache as much as possible.
As the result of the software optimizations we obtain that our hybrid software cache performs from 4 to 10 times faster than traditional software cache on a set of NAS parallel benchmarks.
We do not stop with software caching.
We cover some other aspects of the architectures with on-chip local memories, such as the quality of the generated code and its correspondence with the quality of the buffer management in local memories, in order to improve performance of these architectures.
Therefore, we run our research till we reach the limit in software and start proposing optimizations on the hardware level.
Two hardware proposals are presented in this thesis.
One is about relaxing alignment constraints imposed in the architectures with on-chip local memories and the other proposal is about accelerating the management of local memories by providing hardware support for the majority of actions performed in our software cache.
Malgrat les memòries cau encara son el component basic pel disseny del subsistema de memòria, les memòries locals han esdevingut una alternativa degut a les seves característiques pel que fa a l’ocupació d’àrea, el seu consum energètic i el seu rendiment amb un temps d’accés ràpid i constant.
Aquestes característiques son d’especial interès quan les properes arquitectures multi-nucli estan limitades pel consum de potencia i la latència del subsistema de memòria.
Les memòries locals pateixen de limitacions respecte la complexitat en la seva programació, fet que dificulta la seva introducció en arquitectures multi-nucli, tot i els avantatges esmentats anteriorment.
Aquesta tesi presenta un seguit de solucions basades en programari i maquinari específicament dissenyat per resoldre aquestes limitacions.
Les optimitzacions del programari estan basades amb tècniques d'emmagatzematge de memòria cau suportades per llibreries especifiques.
La memòria cau per programari és un sòlid mètode per proporcionar a l'usuari una visió transparent de l'arquitectura, però aquest enfocament pot patir d'un rendiment deficient.
En aquesta tesi, es proposa una estructura jeràrquica i híbrida.
Posteriorment, desenvolupem optimitzacions per tal d'accelerar l’execució del programari que suporta el disseny de la memòria cau.
Com a resultat de les optimitzacions realitzades, obtenim que el nostre disseny híbrid es comporta de 4 a 10 vegades més ràpid que una implementació tradicional de memòria cau sobre un conjunt d’aplicacions de referencia, com son els “NAS parallel benchmarks”.
El treball de tesi inclou altres aspectes de les arquitectures amb memòries locals, com ara la qualitat del codi generat i la seva correspondència amb la qualitat de la gestió de memòria intermèdia en les memòries locals, per tal de millorar el rendiment d'aquestes arquitectures.
La tesi desenvolupa propostes basades estrictament en el disseny de nou maquinari per tal de millorar el rendiment de les memòries locals quan ja no es possible realitzar mes optimitzacions en el programari.
En particular, la tesi presenta dues propostes de maquinari: una relaxa les restriccions imposades per les memòries locals respecte l’alineament de dades, l’altra introdueix maquinari específic per accelerar les operacions mes usuals sobre les memòries locals.

Related Results

Performance simulation methodologies for hardware/software co-designed processors
Performance simulation methodologies for hardware/software co-designed processors
Recently the community started looking into Hardware/Software (HW/SW) co-designed processors as potential solutions to move towards the less power consuming and the less complex de...
Towards Intelligent Zone-Based Content Pre-Caching Approach in VANET for Congestion Control
Towards Intelligent Zone-Based Content Pre-Caching Approach in VANET for Congestion Control
In vehicular ad hoc networks (VANETs), content pre-caching is a significant technology that improves network performance and lowers network response delay. VANET faces network cong...
Joint caching and sleeping optimisation for D2D‐aided ultra‐dense network
Joint caching and sleeping optimisation for D2D‐aided ultra‐dense network
Device‐to‐device (D2D) communication provides the communication of the users in the vicinity and thereby decreases end‐to‐end delay and power consumption. More importantly, D2D com...
Comparing genome-wide chromatin profiles using ChIP-chip or ChIP-seq
Comparing genome-wide chromatin profiles using ChIP-chip or ChIP-seq
AbstractMotivation: ChIP-chip and ChIP-seq technologies provide genome-wide measurements of various types of chromatin marks at an unprecedented resolution. With ChIP samples colle...
Optimal Video Caching at The Edge of Network by Using Machine Learning
Optimal Video Caching at The Edge of Network by Using Machine Learning
Abstract Efficiently managing network resources in the dynamic field of video-on-demand (VoD) services is a significant challenge. This requires creative methods to optimiz...
Intelligent Caching for Mobile Video Streaming in Vehicular Networks with Deep Reinforcement Learning
Intelligent Caching for Mobile Video Streaming in Vehicular Networks with Deep Reinforcement Learning
Caching-enabled multi-access edge computing (MEC) has attracted wide attention to support future intelligent vehicular networks, especially for delivering high-definition videos in...
Software driven approach for Embedded Devices
Software driven approach for Embedded Devices
This paper presents the possible new design paradigm that emerged during the author’s design of an embedded communication device for Croatian Navy. Prior to codesign techniques tha...
Abstract 4146122: Potential Protective Roles of Clonal Hematopoiesis of Indeterminate Potential in Angina Pectoris
Abstract 4146122: Potential Protective Roles of Clonal Hematopoiesis of Indeterminate Potential in Angina Pectoris
Introduction: Clonal hematopoiesis of indeterminate potential (CHIP) poses strong relationship to the occurrence of cardiovascular diseases with the process of aging. I...

Back to Top