
System_Design
文章平均质量分 85
大型软硬件系统的设计案例,原理
EverNoob
simply bumping around
展开
-
(WIP) Network Paradigm Fundamentals and Comparison
在分布式存储网络中,我们使用的协议有RoCE、Infiniband(IB)和TCP/IP。其中RoCE和IB属于RDMA(RemoteDirect Memory Access)技术,他和传统的TCP/IP有什么区别呢,接下来我们将做详细对比。原创 2024-07-19 10:20:48 · 1129 阅读 · 0 评论 -
12 Software Architecture Pitfalls and How to Avoid Them
Good luck!your。转载 2023-12-26 12:29:23 · 133 阅读 · 0 评论 -
Cross Domain Signal Integrity in Asynchronous Designs
Conventional two flip-flop synchronizerfrom Synchronizer Techniques for Multi-Clock Domain SoCs & FPGAs - EDN In general, a conventional two flip-flop synchronizer is used for synchronizing a single bit level signal. As shown in Figure 1 and Figure 2 , fli原创 2023-04-22 18:50:56 · 666 阅读 · 0 评论 -
Common architectures in convolutional neural networks
from: https://www.jeremyjordan.me/convnet-architectures/#lenet5==> most of the graphs cannot be copied to this platform, so just check the linked originalIn this post, I'll discuss commonly used architectures for convolutional networks. As you'll see, almo转载 2023-02-22 18:56:56 · 219 阅读 · 0 评论 -
Domain Specific Compiling: 领域编译器发展的前世今生 • 面向AI的编译技术
作者简介:张朔铭,博士研究生,正在中国科学院计算技术研究所崔慧敏研究员指导下攻读计算机系统结构博士学位,目前主要的研究方向是AI编译。zhangshuoming17@mails.ucas.ac.cn本文分为两个部分,第一部分为综述(领域编译器发展的前世今生 • 综述);这部分重点讨论面向AI领域的编译技术。0. 前言随着人工智能时代的来临,AI领域应用的大量出现也促进着领域编译的发展,最突出的表现就是多种AI编译器的普及和应用。AI领域有几个重要的特征使得AI编译器面临很多新的机遇和挑战:一是AI领域中编程转载 2023-02-21 18:59:50 · 788 阅读 · 0 评论 -
Python: Function Annotation and “inspect“ module
https://peps.python.org/pep-3107/This PEP introduces a syntax for adding arbitrary metadata annotations to Python functions [1].Because Python’s 2.x series lacks a standard way of annotating a function’s parameters and return values, a variety of tools and转载 2022-09-14 18:48:53 · 215 阅读 · 0 评论 -
Python Multi-level Import with ‘.‘
【代码】Python Multi-level Import。转载 2022-09-14 13:54:36 · 153 阅读 · 0 评论 -
Python Multiprocessing
official documentation:multiprocessing — Process-based parallelism — Python 3.10.4 documentationbreakdown of the API:Python multiprocessing - process-based parallelism in PythonMultiprocessing vs. Threading in Python: What you need to know.What.原创 2022-04-25 18:29:53 · 927 阅读 · 0 评论 -
AXI Protocol and AMBA AXI
the important one for hardware is AMBA AXI, which isArm Microcontroller Bus Architecture Advanced eXtensible Interfacesee Arm's own documentation for a controlled learning experience:Documentation – Arm Developerfor a comprehensive coverage of all转载 2022-04-18 17:00:08 · 982 阅读 · 0 评论 -
VPC: Virtual Private Cloud
OverviewFrom Wikipedia, the free encyclopediaVirtual private cloud (VPC)is an on-demand configurable pool ofshared resourcesallocated within apubliccloudenvironment, providing a certain level of isolation between the different organizations (de...转载 2022-04-12 09:47:40 · 247 阅读 · 0 评论 -
Minifloats: FP Types for DNNs
https://en.wikipedia.org/wiki/MinifloatIncomputing,minifloatsarefloating-pointvalues represented with very fewbits. Predictably, they are not well suited for general-purpose numerical calculations. They are used for special purposes, most often in...原创 2022-03-24 12:59:55 · 2709 阅读 · 0 评论 -
Windows Path Length Limits and Workaround
fromMaximum Path Length Limitation - Win32 apps | Microsoft DocsMaximum Path Length LimitationIn the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path isMAX_PATH, which is defined as 260 character..转载 2022-03-23 11:17:32 · 330 阅读 · 0 评论 -
集成电路:工业和技术分类介绍
IC Overviewhttps://en.wikipedia.org/wiki/Integrated_circuitDesign Overviewhttps://en.wikipedia.org/wiki/Integrated_circuit_designFabrication OverviewIC Fabrication Process - JavatpointTextbookhttp://www.ime.cas.cn/icac/learning/learning_3/201原创 2022-02-10 15:30:10 · 1702 阅读 · 0 评论 -
Arm vs. x86
Arm vs x86: Instruction sets, architecture, and more differences explainedAndroid is capable of running on three different types of processor architecture: Arm, Intel, and MIPS. The former is today’s ubiquitous architecture after Intel abandoned its hand转载 2022-02-10 14:54:49 · 238 阅读 · 0 评论 -
C and Cpp overloading
Cpp Overload ResolutionOverload resolution - cppreference.comIn order to compile a function call, the compiler must first performname lookup, which, for functions, may involveargument-dependent lookup, and for function templates may be followed byt...转载 2021-12-30 10:04:16 · 107 阅读 · 0 评论 -
int8 quantization in DNN
from: What Is int8 Quantization and Why Is It Popular for Deep Neural Networks? - MATLAB & SimulinkWhat Is int8 Quantization and Why Is It Popular for Deep Neural Networks?By Ram Cherukuri, MathWorksDeep learning deployment on the edge for转载 2021-12-28 11:08:42 · 299 阅读 · 0 评论 -
NCHW vs. NHWC
the format names describe the storage scheme exactly, NCHW is consecutive in image (HW) then channel, while NHWC is consecutive is point-expansion (C) then image.example:“TensorFlow performance and advance topics”explanation:gpu - How much faster .原创 2021-11-29 11:06:51 · 404 阅读 · 0 评论 -
NUMA Collections
Simple intro:wiki entryNon-uniform memory access(NUMA) is acomputer memorydesign used inmultiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its ownlocal memory...原创 2021-11-19 20:58:47 · 326 阅读 · 0 评论 -
TPUv4/4i: 4th Generation DL DSA
fromTen Lessons From Three Generations Shaped Google’s TPUv4iEvolution of ML DSAfor TPUv1 seeTPUv1: Single Chipped Inference DL DSA_maxzcl的博客-优快云博客for TPUv2/3 seehttps://blog.youkuaiyun.com/maxzcl/article/details/121399583for TPUv1 to TPUv2 seeTPUv...原创 2021-11-19 21:33:59 · 1466 阅读 · 0 评论 -
TPUv2/v3 Design Process
The Design Process for Google’s Training Chips: TPUv2 and TPUv3break down of the accompanying paper:https://blog.youkuaiyun.com/maxzcl/article/details/121399583Challengesof ML Training DSAInference to TrainingMore computationMore means both the types..原创 2021-11-19 12:28:33 · 728 阅读 · 0 评论 -
TPUv2/3 Multi-Chip Parallelized DL DSA
unit isInference vs. TrainingBoth sharesome computational elements including matrix multiplications, convolutions, and activation functions, so inference and training DSAs might have similar functional units. Key architectural aspects where the requi.原创 2021-11-18 20:52:31 · 251 阅读 · 0 评论 -
Systolic Array
Computer Architecture: Dataflow/Systolic Arrayshttps://en.wikipedia.org/wiki/Systolic_arrayKung, H.T. and Leiserson, C.E. Algorithms for VLSI processor arrays. Chapter in Introduction to VLSI systems by C. Mead and L. Conway. Addison-Wesley, Reading, M原创 2021-11-18 13:52:26 · 1181 阅读 · 0 评论 -
DDR vs. HBM
from Will HBM replace DDR and become Computer Memory? - Utmelconclusionmain comparison==> 3d stacking structure is the main reason why HBM can give higher bandwidthHBM is the preferred DRAM NN-DSAOther AriticlesChoosing between D..原创 2021-11-17 17:30:32 · 2122 阅读 · 0 评论 -
Very Long Instruction Word
from:https://en.wikipedia.org/wiki/Very_long_instruction_wordThe traditional means to improve performance in processors include dividing instructions into substeps so the instructions can be executed partly at the same time (termedpipelining), dispatch..转载 2021-11-08 13:57:09 · 215 阅读 · 0 评论 -
Dev. Checklist
Before Dev.what are theknowledgesnecessary? where can I get the reading material? have you learnt them all? do you know them in detail? What is the dev.environment? have/have not? update necessary? do you need to learn it? who ...原创 2021-11-04 11:35:55 · 121 阅读 · 0 评论 -
Asynchronous Iterator and Generator (in JS)
fromAsync iteration and generatorsAsynchronous iteration allow us to iterate over data that comes asynchronously, on-demand. Like, for instance, when we download something chunk-by-chunk over a network. And asynchronous generators make it even mor.转载 2021-10-22 09:01:31 · 156 阅读 · 0 评论 -
Edge Computing
Tech. ReviewWhat Is Edge Computing? Everything You Need to KnowEdge computing is a distributed information technology (IT) architecture in which client data is processed at the periphery of the network, as close to the originating source as possible.原创 2021-10-20 16:08:26 · 200 阅读 · 0 评论