基于深度强化学习的延迟与隐私联合优化仿真研究
1. 在线深度强化学习算法
为解决卸载决策问题,提出了一种在线深度强化学习算法,其伪代码如下:
Algorithm 7 Online deep RL algorithm to solve the offloading decision problem
Input: Wireless channel gain h
Output: x, v, b, a that maximize Q(h, x, v, b, a)
1: Initialize DNNs’ parameters
2: for t in T do
3:
Select right value of K1 and K2 according to t
4:
Select suitable batch size e and learning interval δ
5:
Input h to DNNs to get xt and vt
6:
Generate spare xk1 and vk2
7:
Compute Q for all xk1 and vk2 by solving P2
8:
Select
(
x∗
k1 , v∗
k2
)
that maximize Q
9:
Use (h, x∗
k1 , v∗
k2 ) to update training set
10:
if t > e AND Remainder(t/δ) = 0 then
11:
Ra