AutonoumousDrivingCookbook From Microsoft team

这篇博客介绍了自动驾驶的自助指南,重点关注端到端深度学习教程和分布式深度强化学习在自动驾驶中的应用。作者探讨了强化学习算法,特别是奖励函数和网络架构,以及如何进行转移学习。还详细说明了本地训练任务的启动和模型运行,强调了分布式训练的重要性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

自动驾驶自助指南

好东西来记录一下,督促自己学下去

End-to-end deep learning tutorial

之后review

Distributed Deep Reinforcement Learning for Autonomous Driving

自动驾驶中的分布式训练和强化学习

ExploreAlgorithm 算法探究

Step 1 - Explore the Algorithm

In this notebook you will get an overview of the reinforcement learning algorithm being used for this experiment and the implementation of distributed learning.

The reward function

To compute our reward function, we begin by computing the distance to the center of the nearest road. We then pass that distance through an exponential weighting function to force this portion to the range [0, 1].

def compute_reward(car_state, collision_info, road_points):
    #Define some constant parameters for the reward function
    THRESH_DIST = 3.5                # The maximum distance from the center of the road to compute the reward function
    DISTANCE_DECAY_RATE = 1.2        # The rate at which the reward decays for the distance function
    CENTER_SPEED_MULTIPLIER = 2.0    # The ratio at which we prefer the distance reward to the speed reward
    
    # If the car is stopped, the reward is always zero
    speed = car_state.speed
    if (speed < 2):
        return 0
    
    #Get the car position
    position_key = bytes('position', encoding='utf8')
    x_val_key = bytes('x_val', encoding='utf8')
    y_val_key = bytes('y_val', encoding='utf8')

    car_point = np.array([car_state.kinematics_true[position_key][x_val_key], car_state.kinematics_true[position_key][y_val_key], 0])
    
    # Distance component is exponential distance to nearest line
    distance = 999
    
    #Compute the distance to the nearest center line
    for line in road_points:
        local_distance = 0
        length_squared = ((line[0][0]-line[1][0])**2) + ((line[0][1]-line[1][1])**2)
        if (length_squared != 0):
            t = max(0, min(1, np.dot(car_point-line[0], line[1]-line[0]) / length_squared))
            proj = line[0] + (t * (line[1]-line[0]))
            local_distance = np.linalg.norm(proj - car_point)
        
        distance = min(distance, local_distance)
        
    distance_reward = math.exp(-(distance * DISTANCE_DECAY_RATE))
    
    return distance_reward
Network architecture and transfer learning

If you decide to go the transfer learning route, you will notice that the initial behaviour of the car is much less random. It still won’t drive perfectly since one, our end-to-end model was not the best possible version of itself to begin with, and two, it has never seen elements like other cars, houses etc.


activation = 'relu'
 
# The main model input.
pic_input = Input(shape=(59,255,3))
train_conv_layers = False # For transfer learning, set to True if training ground up.
 
img_stack = Conv2D(16, (3, 3), name='convolution0', padding='same', activation=activation, trainable=train_conv_layers)(pic_input)
img_stack = MaxPooling2D(pool_size=(2,2))(img_stack)
img_stack = Conv2D(32, (3, 3), activation=activation, padding='same', name='convolution1', trainable=train_conv_layers)(img_stack)
img_stack = MaxPooling2D(pool_size=(2, 2))(img_stack)
img_stack = Conv2D(32, (3, 3), activation=activation, padding='same', name='convolution2', trainable=train_conv_layers)(img_stack)
img_stack = MaxPooling2D(pool_size=(2, 2))(img_stack)
img_stack = Flatten()(img_stack)
img_stack = Dropout(0.2)(img_stack)
 
img_stack = Dense(128, name='rl_dense', kernel_initializer=random_normal(stddev=0.01))(img_stack)
img_stack=Dropout(0.2)(img_stack)
output = Dense(5, name='rl_output', kernel_initializer=random_normal(stddev=0.01))(img_stack)
 
opt = Adam()
action_model = Model(inputs=[pic_input], outputs=output)
 
action_model.compile(optimizer=opt, loss='mean_squared_error')
action_model.summary()

Launch Local Training Job

先来定义超参数:

  • batch_update_frequency: This is how often the weights from the actively trained network get copied to the target network. It is also how often the model gets saved to disk. For more details on how this works, check out the Deep Q-learning paper.
  • max_epoch_runtime_sec: This is the maximum runtime for each epoch. If the car has not reached a terminal state after this many seconds, the epoch will be terminated and training will begin.
  • per_iter_epsilon_reduction: The agent uses an epsilon greedy linear annealing strategy while training. This is the amount by which epsilon is reduced each iteration.
  • min_epsilon: The minimum value for epsilon. Once reached, the epsilon value will not decrease any further.
  • batch_size: The minibatch size to use for training.
  • replay_memory_size: The number of examples to keep in the replay memory. The replay memory is a FIFO buffer used to reduce the effects of nearby states being correlated. Minibatches are generated from randomly selecting examples from the replay memory.
  • weights_path: If we are doing transfer learning and using pretrained weights for the model, they will be loaded from this path.
  • train_conv_layers: If we are using pretrained weights, we may prefer to freeze the convolutional layers to speed up training.
  • airsim_path: The path to the folder containing the .ps1 to start AirSim. This path cannot contain spaces.
  • data_dir: The path to the directory containing the road_points.txt and reward_points.txt used to compute the reward function. This path cannot contain spaces.
  • experiment_name: A unique identifier for this experiment
    在这里插入图片描述

Run Model

def append_to_ring_buffer(item, buffer, buffer_size):
    if (len(buffer) >= buffer_size):
        buffer = buffer[1:]
    buffer.append(item)
    return buffer

state_buffer = []
state_buffer_len = 4

print('Running car for a few seconds...')
car_controls.steering = 0
car_controls.throttle = 1
car_controls.brake = 0
car_client.setCarControls(car_controls)
stop_run_time =datetime.datetime.now() + datetime.timedelta(seconds=2)
while(datetime.datetime.now() < stop_run_time):
    time.sleep(0.01)
    state_buffer = append_to_ring_buffer(get_image(car_client), state_buffer, state_buffer_len)

print('Running model')
while(True):
    state_buffer = append_to_ring_buffer(get_image(car_client), state_buffer, state_buffer_len)
    next_state, dummy = model.predict_state(state_buffer)
    next_control_signal = model.state_to_control_signals(next_state, car_client.getCarState())

    car_controls.steering = next_control_signal[0]
    car_controls.throttle = next_control_signal[1]
    car_controls.brake = next_control_signal[2]

    print('State = {0}, steering = {1}, throttle = {2}, brake = {3}'.format(next_state, car_controls.steering, car_controls.throttle, car_controls.brake))

    car_client.setCarControls(car_controls)

    time.sleep(0.1)
分布式训练

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值