User:EMJzero/sandbox

4 days ago 2

← Previous revision Revision as of 14:42, 7 July 2025
Line 3: Line 3:


In computer science, '''spatial architectures''' are a kind of [[computer architecture]] leveraging many collectively coordinated and directly communicating processing elements to quickly and efficiently handle [[embarrassingly parallel|highly parallelizable]] [[compute kernel|kernels]].
In computer science, '''spatial architectures''' are a kind of [[computer architecture]] leveraging many collectively coordinated and directly communicating processing elements to quickly and efficiently handle [[embarrassingly parallel|highly parallelizable]] [[compute kernel|kernels]].
Each processing element is simple, namely a <multiply-and-accumulate> <functional unit> or a stripped-down <core>.
The "spatial" term comes from processing element instances being typically arranged in 1D or 2D array, both logically, and in the silicon design.
The "spatial" term comes from processing element instances being typically arranged in 1D or 2D array, both logically, and in the silicon design.
Their most common workload consists of [[matrix multiplication]], [[convolutional neural network#Convolutional_layers|convolution]], or, in general, [[tensor contraction]].
Their most common workload consists of [[matrix multiplication]], [[convolutional neural network#Convolutional_layers|convolution]], or, in general, [[tensor contraction]].
As such, spatial architectures are often used in [[deep learning]] [[hardware accelerators]].
As such, spatial architectures are often used in [[deep learning]] [[hardware accelerators]].


The key goal of spatial architectures is to reduce the latency and power consumption of running very large kernels.
(borrowing the "[[dataflow]]" concept to refer to the ...)

Spatial architectures can be designed or programmed to support different algorithms, which are mapped onto the processing elements using specialized [[dataflow]].

(borrowing the "[[dataflow]]" concept to refer to the ...) <ref name="eyeriss" />

==Design Details==

The heart of a spatial archiectures is its multidimensional array of processing elements.
Each processing element is simple, namely a [[Multiply–accumulate operation|multiply-and-accumulate]] [[execution unit|functional unit]] or a stripped-down [[cpu|core]].
PEs are then progressively connected with each other and a memory hierarchy through busses or a network-on-chip.
% REMOVED FOR SPACE REASONS
%The memory hierarchy is explicitly managed and typically consists of multiple on-chip buffers like register files, SRAM scratchpads and FIFOs, backed by a large off-chip DRAM and non-volatile memories.
%Implementations of SAs, as defined in \cite{EYERISS}, exist as ASICs, on CGRAs, and FPGAs \cite{silvano2023survey}.
The memory hierarchy is explicitly managed and consists of multiple on-chip buffers like register files, SRAM scratchpads, and FIFOs, backed by a large off-chip DRAM and non-volatile memories \cite{EYERISS}.

The key performance metrics for a SA are its energy, latency, and Energy-Delay Product (EDP) when running a given workload.
Due to technology and bandwidth limitations, the energy and latency required to access larger memories, like DRAM, dominate those of computation, being hundreds of times more than what's needed for storage near PEs \cite{TPUv4i}.
That's why a SA's memory hierarchy is intended to localize most repeated value accesses on faster and more efficient on-chip memories, exploiting data reuse to minimize costly accesses.


==Examples==
==Examples==


* [[NVDLA]]:
* [[NVDLA]]:
* [[AMD AI Engine
* [[Tensor Processing Unit]]:
* [[Tensor Core]]:
* [[AMD AI Engine]]<ref name="amd_ai_engine" />:


==See Also==
==See Also==
Line 23: Line 42:


<references>
<references>
<ref name="eyeriss">{{cite journal|last1=Chen|first1=Yu-Hsin|last2=Emer|first2=Joel|last3=Sze|first3=Vivienne|journal=2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)|title=Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks|year=2016|pages=367-379|doi=10.1109/ISCA.2016.40}}</ref>
<ref name="eyeriss">Content</ref>
<ref name="horowitz">* {{cite journal|last1=Horowitz|first1=Mark|journal=2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)|title = 1.1 Computing's energy problem (and what we can do about it)|year=2014|pages=10-14|doi=10.1109/ISSCC.2014.6757323}}</ref>
<ref name="horowitz">Content</ref>
<ref name="flexibility">Content</ref>
<ref name="maestro_flexibility">Content</ref>
<ref name="tpuv1">Content</ref>
<ref name="tpuv1">Content</ref>
<ref name="tpuv4">Content</ref>
<ref name="tpuv4">Content</ref>
<ref name="simba">Content</ref>
<ref name="simba">Content</ref>
<ref name="gemmini">Content</ref>
<ref name="gemmini">Content</ref>
<ref name="amd_ai_engine">{{cite web|url=https://www.amd.com/en/products/adaptive-socs-and-fpgas/technologies/ai-engine.html|title=AMD AI Engine Technology|website=amd.com|date=2025-07-07}}</ref>
<ref name="silvano_survey">Content</ref>
<ref name="silvano_survey">Content</ref>
<ref name="berkeley_survey">Content</ref>
<ref name="berkeley_survey">Content</ref>
Line 41: Line 61:
==Futher Readings==
==Futher Readings==


* {{cite book |first1=Vivienne |last1=Sze |first2=Yu-Hsin |last2=Chen |title=Efficient Processing of Deep Neural Networks |year=2022 |publisher=Morgan & Claypool Publishers |isbn=978-3-031-01766-7}}
* {{cite book|first1=Vivienne|last1=Sze|first2=Yu-Hsin|last2=Chen|title=Efficient Processing of Deep Neural Networks|year=2022|publisher=Morgan & Claypool Publishers|isbn=978-3-031-01766-7}}


==External links==
==External links==
Open Full Post