
Computer_Architecture
文章平均质量分 90
EverNoob
simply bumping around
展开
-
Cache Invalidation
Learn how。转载 2024-01-30 16:19:17 · 260 阅读 · 0 评论 -
TorchSparse: 3D SC/SSC Acceleration on GPU
Paper:TorchSparse: Efficient Point Cloud Inference EngineNotation:Mappingto get output position set:when down-sampling, since we want to sample as manysparse input sites as possible, we slack the SSC i/o mapping condition to p < s*...原创 2022-05-26 17:19:00 · 601 阅读 · 0 评论 -
AXI Protocol and AMBA AXI
the important one for hardware is AMBA AXI, which isArm Microcontroller Bus Architecture Advanced eXtensible Interfacesee Arm's own documentation for a controlled learning experience:Documentation – Arm Developerfor a comprehensive coverage of all转载 2022-04-18 17:00:08 · 982 阅读 · 0 评论 -
Semiconductor Engineering Vocabulary
Recommended General SourcesWikiChipSemiconductor Engineering - Deep Insights For Chip Engineerschip, wafer, die, maskChips, wafers, dies, masks, and photolithographychip vs. diesometimes used interchangeably, else chip would specifically refer.原创 2022-03-22 15:29:34 · 698 阅读 · 0 评论 -
Numpy and SIMD
Numpy is by its design a SIMD structure, which is best examplified by the list indexing feature:python - How to filter numpy array by list of indices? - Stack Overflowfilter_indices = [1,3,5]np.array([11,13,155,22,0xff,32,56,88])[filter_indices]原创 2022-02-17 11:02:35 · 1539 阅读 · 0 评论 -
Arm vs. x86
Arm vs x86: Instruction sets, architecture, and more differences explainedAndroid is capable of running on three different types of processor architecture: Arm, Intel, and MIPS. The former is today’s ubiquitous architecture after Intel abandoned its hand转载 2022-02-10 14:54:49 · 238 阅读 · 0 评论 -
Out of Order (OoO) and Speculative Execution
Speculative ExecutionWhat Is Speculative Execution? - ExtremeTechhttps://www.extremetech.com/computing/261792-what-is-speculative-executionWith an AMD-centric potential security flawin the news, it’s a good time to revisit the question of what speculati.转载 2022-01-30 18:06:35 · 769 阅读 · 0 评论 -
Predication in Computer Architecture
Overviewfrom:PredicationPredication is the conditional execution of instructions. Conditional execution is implemented through branches in traditional architectures. Predication removes branches used for conditional execution. Predicated execution avo.转载 2022-01-25 15:48:52 · 177 阅读 · 0 评论 -
de/interleave with ARM
ARM DocumentNEON structure loads read data from memory into 64-bit NEON registers, with optional de-interleaving. Stores work similarly, interleaving data from registers before writing it to memory asFigure 6.4shows. For more information seeVLDn and V...转载 2022-01-25 14:51:07 · 269 阅读 · 0 评论 -
Deep Learning with 4-bit systems (int4)
4-bit introduction paper:https://papers.nips.cc/paper/2020/file/13b919438259814cd5be8cb45877d577-Paper.pdf4-bit CNN paper:https://arxiv.org/pdf/2009.06488.pdfshort news articles:https://medium.com/swlh/4-bit-deep-learning-d1614c0883e3https://towa..转载 2021-12-25 11:04:42 · 238 阅读 · 0 评论 -
NUMA Collections
Simple intro:wiki entryNon-uniform memory access(NUMA) is acomputer memorydesign used inmultiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its ownlocal memory...原创 2021-11-19 20:58:47 · 326 阅读 · 0 评论 -
TPUv4/4i: 4th Generation DL DSA
fromTen Lessons From Three Generations Shaped Google’s TPUv4iEvolution of ML DSAfor TPUv1 seeTPUv1: Single Chipped Inference DL DSA_maxzcl的博客-优快云博客for TPUv2/3 seehttps://blog.youkuaiyun.com/maxzcl/article/details/121399583for TPUv1 to TPUv2 seeTPUv...原创 2021-11-19 21:33:59 · 1466 阅读 · 0 评论 -
DMA Collections
Overviewhttps://en.wikipedia.org/wiki/Direct_memory_access#Modes_of_operationhttps://www.silabs.com/documents/public/application-notes/AN0013.pdfPerformancehttps://indico.cern.ch/event/453673/contributions/1951556/attachments/1170310/1689185/DMA_pe原创 2021-11-19 11:16:53 · 219 阅读 · 0 评论 -
TPUv2/v3 Design Process
The Design Process for Google’s Training Chips: TPUv2 and TPUv3break down of the accompanying paper:https://blog.youkuaiyun.com/maxzcl/article/details/121399583Challengesof ML Training DSAInference to TrainingMore computationMore means both the types..原创 2021-11-19 12:28:33 · 728 阅读 · 0 评论 -
TPUv2/3 Multi-Chip Parallelized DL DSA
unit isInference vs. TrainingBoth sharesome computational elements including matrix multiplications, convolutions, and activation functions, so inference and training DSAs might have similar functional units. Key architectural aspects where the requi.原创 2021-11-18 20:52:31 · 251 阅读 · 0 评论 -
Systolic Array
Computer Architecture: Dataflow/Systolic Arrayshttps://en.wikipedia.org/wiki/Systolic_arrayKung, H.T. and Leiserson, C.E. Algorithms for VLSI processor arrays. Chapter in Introduction to VLSI systems by C. Mead and L. Conway. Addison-Wesley, Reading, M原创 2021-11-18 13:52:26 · 1182 阅读 · 0 评论 -
TPUv1: Single Chipped Inference DL DSA
fromA Domain-Specific Architecture for Deep Neural Networks | September 2018 | Communications of the ACM原创 2021-11-17 16:43:18 · 278 阅读 · 0 评论 -
TPU: DL DSA
2020 IEEE International Solid-State Circuits ConferenceThe Deep Learning Revolution and Its Implications for Computer Architecture and Chip DesignIncentives1. the various improvements in NN-AI performance in tasks across multiple areas of endeavor wi原创 2021-11-17 16:36:17 · 156 阅读 · 0 评论 -
Very Long Instruction Word
from:https://en.wikipedia.org/wiki/Very_long_instruction_wordThe traditional means to improve performance in processors include dividing instructions into substeps so the instructions can be executed partly at the same time (termedpipelining), dispatch..转载 2021-11-08 13:57:09 · 215 阅读 · 0 评论