Computer Organization and Design--计组作业习题(6)

本文通过模拟不同类型的缓存配置来评估其性能表现。包括全相联、直接映射和组相联缓存的不同场景,并对缓存命中率、缓存替换策略等关键指标进行分析。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Computer Organization and Design

 

 ----------------------个人作业,如果有后辈的作业习题一致,可以参考学习,一起交流,请勿直接copy

 

 

 Problem 2. Cache Associativity (8 points)

 

For this question, you will simulate different configurations of an 8 block cache using the following 

8-bit memory accesses:

     load      100

     store    102

     store    88

     load     120

     load     90

 

a) Simple Questions

     i) If the block size is 4 bytes, how large must the cache be?

----------32 bytes ;

     ii) How many bits will be used for the offset?

----------2 bits will be used for the offset ;

b) Fully-associative with a 4-byte block size

    i) How many bits should be used for the tag?

----------6 bits should be used for the tag ;

    ii) Complete the table by simulating the above memory references

         Show each entry, and simply cross out blocks as they are overwritten

     

 

 

Block#

Tag

Dirty

0

25

0->1

1

22

1

2

30

 

3

 

 

4

 

 

5

 

 

6

 

 

7

 

 

 

c) Direct-mapped with a 4-byte block size

    i) How many bits should be used for the tag?

                 ------------3 bits should be used for the tag ;

    ii) Complete the table by simulating the above memory references

         Show each entry, and simply cross out blocks as they are overwritten

 

 

Block/Set#

Tag

Dirty

0

 

 

1

3

0->1

2

 

 

3

 

 

4

 

 

5

 

 

6

2->3->2

1->0->0

7

 

 

 

 

 

 

 

d) 2-way Set-associative with a 4-byte block size

    i) How many bits should be used for the tag?

                 ------------4 bits should be used for the tag ;

    ii) Complete the table by simulating the above memory references

         Show each entry, and simply cross out blocks as they are overwritten

 

Set#

Block#

Tag

Dirty

0

0

 

 

 

1

 

 

1

2

6

0->1

 

3

 

 

2

4

5

1

 

5

7

0

3

6

 

 

 

7

 

 

 

 

e) Matching 

     Note: Some may have multiple answers and some answers may not be used. 

__bd___full-associative cache                 a) Fewest connections to memory are needed

__ae___ direct-mapped cache                 b) Provides optimal cache utilization

___f___ set-associative cache                 c) Allows a larger block size to be used

                                               d) Largest Tag overhead

                                             e) Smallest Tag overhead

                                             f) May prevent conflict when two blocks have the same set

                                                  (N/A to full-associative caches)

Problem 3. Cache Comparison (8 Points)

EZ-Cache Company has hired you to design their next generation cache for the LC2k. They have want to use a 32-byte direct-mapped cache with a 4-byte block size. 

 

a) Using the following sequence of memory accesses, compute the number of cache hits:

     load  200

     store 204

     store 236

     load  201

     load  208

     store 234

     load  204

     load  239

     store 201

 

Block/Set#

Tag

Dirty

0

 

 

1

 

 

2

6->7->6

 

3

6->7->6->7

 

4

6

 

5

 

 

6

 

 

7

 

 

 

 

Number of Hits: ________1________

 

b) EZ-Cache believes they can improve the hit rate by either increasing the block size or increasing the associativity

 

Simulate the above memory accesses for a 32-byte direct-mapped cache with an 8-byte block size

 

Block/Set#

Tag

0

 

1

6->7->6->7->6->7->6

2

6

3

 

 

 

Number of Hits: _________1________

 

Simulate the above memory accesses for a 32-byte 2-way associative cache with a 4-byte block size

 

 

 

Set#

Block#

Tag

0

0

13

 

1

 

1

2

 

 

3

 

2

4

12

 

5

14

3

6

12

 

7

14

 

Number of Hits ________4__________

 

 

Which one of the configurations is best for the memory access sequence?

 

                 -----------2 ways ,4 bytes block associative cache ;

 

 

 

Problem 4 (12 points)

 

You have been given the following two caches which are both byte addressable and use 16 bit memory addresses.

 

 

Cache

Cache A

Cache B

Total size (Bytes)

16

16

Block size (Bytes)

4

4

Organization

Fully Associative

Direct Mapped

Replacement policy

LRU

-

Write policy

Allocate on write

Allocate on write

 

 

 

 

 

 

 

 

 

a) The following addresses are referenced in the given order; please put an H for each of the hits and an M for each of the misses for both the caches. Also calculate the hit rate for each cache. An extra column for an infinite size fully-associative cache (also of block size 4 bytes) is given to make the calculation for part (b) easy. [8 pts]

 

Address(hex)

Address(binary)

Infinite

Cache A

Cache B

0x0000

0000 0000 0000 0000

M

M

M

0x0007

0000 0000 0000 0111

M

M

M

0x0003

0000 0000 0000 0011

H

H

H

0x0009

0000 0000 0000 1001

M

M

M

0x0016

0000 0000 0001 0110

M

M

M

0x0005

0000 0000 0000 0101

H

H

M

0x000D

0000 0000 0000 1101

M

M

M

0x0001

0000 0000 0000 0001

H

M

H

Hit Rate

 

3/8

2/8

2/8

 

-----------------(2/8+2/8)*2*8=8 ;

 

b) For each reference in the previous sequence of references, classify them using one of the four possible labels HIT (if the access is a hit) or COMPULSORY / CAPACITY / CONFLICT (if it’s a miss depending on the type of miss) [4 pts]

 

Address(hex)

Cache A

Cache B

0x0000

COMPULSORY

COMPULSORY

0x0007

COMPULSORY

COMPULSORY

0x0003

HIT

HIT

0x0009

COMPULSORY

COMPULSORY

0x0016

COMPULSORY

COMPULSORY

0x0005

HIT

CONFLICT

0x000D

COMPULSORY

COMPULSORY

0x0001

CAPACITY

HIT

 

 

 

 

 

 

 

 

 

 

 

 

 

------------0.25*8*2=4 ;

Problem 5 (10 points)

The picojoule microprocessor has a byte-addressable ISA and only 64 bytes of memory. It has a 16 byte, 2-way set-associative, write-back, write-allocate cache, and uses a block size of 2 bytes. Each load / store instruction accesses a single byte. The OB0 and OB1 fields in the cache hold the 2 data bytes in a block (1 byte each). Given the following sequence of instructions, update the cache after each instruction. When both ways in a set are invalid and a block has to be allocated, the cache logic puts higher priority on way 0. Use decimal value for the B0 and B1 field and binary for the rest, if a cache block is invalid you don’t have to fill in anything, if you don’t know the value of a certain field for a valid block put a X there. The initial empty state of the cache is given. The content of the following memory locations are known:

 

 

M[4]=7

M[14]=11

M[15]=13

M[37]=17

M[45]=19

 

 

The instructions (LD is a load and ST is a store) follow:

 

 

1: LD R1 ← M[4]

2: LD R2 ← M[37]

3: ST R1 → M[36]

4: ST R2 → M[5]

5: LD R1 ← M[15]

6: LD R2 ← M[14]

7: LD R1 ← M[45]

8: ST R2 → M[44]

9: HALT

 

 

 

Part (a) [8 points]

 

 

 

Initial

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

0

 

 

 

 

 

0

 

 

 

 

 

Set 3

0

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 After instruction 1  1: LD R1 ← M[4]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

0

 

000

7

X

0

 

 

 

 

 

Set 3

0

 

 

 

 

 

0

 

 

 

 

 

 

After instruction 2  2: LD R2 ← M[37]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

0

LRU

000

7

X

1

0

 

100

X

17

Set 3

0

 

 

 

 

 

0

 

 

 

 

 

 

After instruction 3  3: ST R1 → M[36]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

0

LRU

000

7

X

1

1

 

100

7

17

Set 3

0

 

 

 

 

 

0

 

 

 

 

 

 

After instruction 4  4: ST R2 → M[5]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

1

 

000

7

17

1

1

LRU

100

7

17

Set 3

0

 

 

 

 

 

0

 

 

 

 

 

 

After instruction 5  5: LD R1 ← M[15]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

1

 

000

7

17

1

1

LRU

100

7

17

Set 3

1

0

 

001

11

13

0

 

 

 

 

 

 

After instruction 6  6: LD R2 ← M[14]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

1

 

000

7

17

1

1

LRU

100

7

17

Set 3

1

0

 

001

11

13

0

 

 

 

 

 

 

After instruction 7  7: LD R1 ← M[45]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

1

LRU

000

7

17

1

0

 

101

X

19

Set 3

1

0

 

001

11

13

0

 

 

 

 

 

 

After instruction 8   8: ST R2 → M[44]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

1

LRU

000

7

17

1

1

 

101

11

19

Set 3

1

0

 

001

11

13

0

 

 

 

 

 

 

 

 

 

Part (b):

In total how many bytes are written to memory for executing instruction 1 to 8 (including instruction 8) ? How many more bytes will have to be written to memory after HALT is executed? [2 points]

 

------------In total 2 bytes are written to memory for executing instruction 1 to 8 ;

 2x2=4 bytes will have to be written to memory after HALT is executed.

 

Problem 6 (8 points)

A certain workload having the following instruction mix is run on two processor designs with both having I-Cache and D-Cache.

 

ADD 10%

NAND 20%

BEQ 25%

SW 15%

LW 30%

 

Additionally, it is known that I-Cache Hit-rate is 90%, D-Cache Hit-rate is 98%, 45% branches are not taken and 25% of LW instructions are followed by a dependent instruction. The memory takes 75 nano-seconds to access.

 

a) Assuming the above code is run on a standard LC-2K 5-stage pipeline design processor with forwarding and with branches predicted not taken and clocked at 200MHz, what is the CPI? Show your work. [3 points]

 

Clock period : 1 / 200MHz = 5ns

Cache : 75 ns / 5ns = 15 cycles

CPI = 1  +  1*0.10*15  +  (0.3+0.15)*0.02*15  +  0.3*0.25*1  +  0.25*0.55*3 = 3.1225  

 

 

b) This five stage pipeline is extended to a similar 15 stage pipeline with no additional hazards being introduced. The amount of stall cycles needed for a lw followed by a dependent instruction does not change. The new frequency is 400MHz. Now the same code is run on the 15 stage pipeline where branches are resolved in the 11th stage. 

I. What is the new CPI? Show your work. [4 pts]

II. Does this new design result in better performance for this workload? [1 pt]

 

  I :

Beq :  11 – 1 = 10 cycles 

Clock period :   1/400 MHz = 2.5ns

Cache :  75ns / 2.5 ns = 30 cycles

CPI = 1  +  1*0.1*30  +  (0.3+0.15)*0.02*30  +  0.3*0.25*1  +  0.25*0.55*10 = 5.72

  II :

(a) : 5 ns * 3.1225 = 15.6125 ns ;

(b) : 2.5 ns * 5.72 = 14.3 ns ;

Yes, this new design result in better performance for this workload.

 

转载于:https://www.cnblogs.com/nanashi/p/6662279.html

内容概要:该论文探讨了一种基于粒子群优化(PSO)的STAR-RIS辅助NOMA无线通信网络优化方法。STAR-RIS作为一种新型可重构智能表面,能同时反射和传输信号,与传统仅能反射的RIS不同。结合NOMA技术,STAR-RIS可以提升覆盖范围、用户容量和频谱效率。针对STAR-RIS元素众多导致获取完整信道状态信息(CSI)开销大的问题,作者提出一种在不依赖完整CSI的情况下,联合优化功率分配、基站波束成形以及STAR-RIS的传输和反射波束成形向量的方法,以最大化总可实现速率并确保每个用户的最低速率要求。仿真结果显示,该方案优于STAR-RIS辅助的OMA系统。 适合人群:具备一定无线通信理论基础、对智能反射面技术和非正交多址接入技术感兴趣的科研人员和工程师。 使用场景及目标:①适用于希望深入了解STAR-RIS与NOMA结合的研究者;②为解决无线通信中频谱资源紧张、提高系统性能提供新的思路和技术手段;③帮助理解PSO算法在无线通信优化问题中的应用。 其他说明:文中提供了详细的Python代码实现,涵盖系统参数设置、信道建模、速率计算、目标函数定义、约束条件设定、主优化函数设计及结果可视化等环节,便于读者理解和复现实验结果。此外,文章还对比了PSO与其他优化算法(如DDPG)的区别,强调了PSO在不需要显式CSI估计方面的优势。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值