AutonoumousDrivingCookbook From Microsoft team

最新推荐文章于 2022-06-15 11:04:14 发布

原创最新推荐文章于 2022-06-15 11:04:14 发布 · 348 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#自动驾驶 #microsoft #autonomous #迁移学习 #强化学习

这篇博客介绍了自动驾驶的自助指南，重点关注端到端深度学习教程和分布式深度强化学习在自动驾驶中的应用。作者探讨了强化学习算法，特别是奖励函数和网络架构，以及如何进行转移学习。还详细说明了本地训练任务的启动和模型运行，强调了分布式训练的重要性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

AutonoumousDrivingCookbook From Microsoft team

自动驾驶自助指南

自动驾驶自助指南

好东西来记录一下，督促自己学下去

End-to-end deep learning tutorial

之后review

Distributed Deep Reinforcement Learning for Autonomous Driving

自动驾驶中的分布式训练和强化学习

ExploreAlgorithm 算法探究

Step 1 - Explore the Algorithm

In this notebook you will get an overview of the reinforcement learning algorithm being used for this experiment and the implementation of distributed learning.

The reward function

To compute our reward function, we begin by computing the distance to the center of the nearest road. We then pass that distance through an exponential weighting function to force this portion to the range [0, 1].

def compute_reward(car_state, collision_info, road_points):
    #Define some constant parameters for the reward function
    THRESH_DIST = 3.5                # The maximum distance from the center of the road to compute the reward function
    DISTANCE_DECAY_RATE = 1.2        # The rate at which the reward decays for the distance function
    CENTER_SPEED_MULTIPLIER = 2.0    # The ratio at which we prefer the distance reward to the speed reward
    
    # If the car is stopped, the reward is always zero
    speed = car_state.speed
    if (speed < 2):
        return 0
    
    #Get the car position
    position_key = bytes('position', encoding='utf8')
    x_val_key = bytes('x_val', encoding='utf8')
    y_val_key = bytes('y_val', encoding='utf8')

    car_point = np.array([car_state.kinematics_true[position_key][x_val_key], car_state.kinematics_true[position_key][y_val_key], 0])
    
    # Distance component is exponential distance to nearest line
    distance = 999
    
    #Compute the distance to the nearest center line
    for line in road_points:
        local_distance = 0
        length_squared = ((line[0][0]-line[1][0])**2) + ((line[0][1]-line[1][1])**2)
        if (length_squared != 0):
            t = max(0, min(1, np.dot(car_point-line[0], line[1]-line[0]) / length_squared))
            proj = line[0] + (t * (line[1]-line[0]))
            local_distance = np.linalg.norm(proj - car_point)
        
        distance = min(distance, local_distance)
        
    distance_reward = math.exp(-(distance * DISTANCE_DECAY_RATE))
    
    return distance_reward

Network architecture and transfer learning

If you decide to go the transfer learning route, you will notice that the initial behaviour of the car is much less random. It still won’t drive perfectly since one, our end-to-end model was not the best possible version of itself to begin with, and two, it has never seen elements like other cars, houses etc.


activation = 'relu'
 
# The main model input.
pic_input = Input(shape=(59,255,3))
train_conv_layers = False # For transfer learning, set to True if training ground up.
 
img_stack = Conv2D(16, (3, 3), name='convolution0', padding='same', activation=activation, trainable=train_conv_layers)(pic_input)
img_stack = MaxPooling2D(pool_size=(2,2))(img_stack)
img_stack = Conv2D(32, (3, 3), activation=activation, padding='same', name='convolution1', trainable=train_conv_layers)(img_stack)
img_stack = MaxPooling2D(pool_size=(2, 2))(img_stack)
img_stack = Conv2D(32, (3, 3), activation=activation, padding='same', name='convolution2', trainable=train_conv_layers)(img_stack)
img_stack = MaxPooling2D(pool_size=(2, 2))(img_stack)
img_stack = Flatten()(img_stack)
img_stack = Dropout(0.2)(img_stack)
 
img_stack = Dense(128, name='rl_dense', kernel_initializer=random_normal(stddev=0.01))(img_stack)
img_stack=Dropout(0.2)(img_stack)
output = Dense(5, name='rl_output', kernel_initializer=random_normal(stddev=0.01))(img_stack)
 
opt = Adam()
action_model = Model(inputs=[pic_input], outputs=output)
 
action_model.compile(optimizer=opt, loss='mean_squared_error')
action_model.summary()

Launch Local Training Job

先来定义超参数：

batch_update_frequency: This is how often the weights from the actively trained network get copied to the target network. It is also how often the model gets saved to disk. For more details on how this works, check out the Deep Q-learning paper.
max_epoch_runtime_sec: This is the maximum runtime for each epoch. If the car has not reached a terminal state after this many seconds, the epoch will be terminated and training will begin.
per_iter_epsilon_reduction: The agent uses an epsilon greedy linear annealing strategy while training. This is the amount by which epsilon is reduced each iteration.
min_epsilon: The minimum value for epsilon. Once reached, the epsilon value will not decrease any further.
batch_size: The minibatch size to use for training.
replay_memory_size: The number of examples to keep in the replay memory. The replay memory is a FIFO buffer used to reduce the effects of nearby states being correlated. Minibatches are generated from randomly selecting examples from the replay memory.
weights_path: If we are doing transfer learning and using pretrained weights for the model, they will be loaded from this path.
train_conv_layers: If we are using pretrained weights, we may prefer to freeze the convolutional layers to speed up training.
airsim_path: The path to the folder containing the .ps1 to start AirSim. This path cannot contain spaces.
data_dir: The path to the directory containing the road_points.txt and reward_points.txt used to compute the reward function. This path cannot contain spaces.
experiment_name: A unique identifier for this experiment

Run Model

def append_to_ring_buffer(item, buffer, buffer_size):
    if (len(buffer) >= buffer_size):
        buffer = buffer[1:]
    buffer.append(item)
    return buffer

state_buffer = []
state_buffer_len = 4

print('Running car for a few seconds...')
car_controls.steering = 0
car_controls.throttle = 1
car_controls.brake = 0
car_client.setCarControls(car_controls)
stop_run_time =datetime.datetime.now() + datetime.timedelta(seconds=2)
while(datetime.datetime.now() < stop_run_time):
    time.sleep(0.01)
    state_buffer = append_to_ring_buffer(get_image(car_client), state_buffer, state_buffer_len)

print('Running model')
while(True):
    state_buffer = append_to_ring_buffer(get_image(car_client), state_buffer, state_buffer_len)
    next_state, dummy = model.predict_state(state_buffer)
    next_control_signal = model.state_to_control_signals(next_state, car_client.getCarState())

    car_controls.steering = next_control_signal[0]
    car_controls.throttle = next_control_signal[1]
    car_controls.brake = next_control_signal[2]

    print('State = {0}, steering = {1}, throttle = {2}, brake = {3}'.format(next_state, car_controls.steering, car_controls.throttle, car_controls.brake))

    car_client.setCarControls(car_controls)

    time.sleep(0.1)

分布式训练

在这里插入图片描述