AutonoumousDrivingCookbook From Microsoft team
自动驾驶自助指南
好东西来记录一下,督促自己学下去
End-to-end deep learning tutorial
之后review
Distributed Deep Reinforcement Learning for Autonomous Driving
自动驾驶中的分布式训练和强化学习
ExploreAlgorithm 算法探究
Step 1 - Explore the Algorithm
In this notebook you will get an overview of the reinforcement learning algorithm being used for this experiment and the implementation of distributed learning.
The reward function
To compute our reward function, we begin by computing the distance to the center of the nearest road. We then pass that distance through an exponential weighting function to force this portion to the range [0, 1].
def compute_reward(car_state, collision_info, road_points):
#Define some constant parameters for the reward function
THRESH_DIST = 3.5 # The maximum distance from the center of the road to compute the reward function
DISTANCE_DECAY_RATE = 1.2 # The rate at which the reward decays for the distance function
CENTER_SPEED_MULTIPLIER = 2.0 # The ratio at which we prefer the distance reward to the speed reward
# If the car is stopped, the reward is always zero
speed = car_state.speed
if (speed < 2):
return 0
#Get the car position
position_key = bytes('position', encoding='utf8')
x_val_key = bytes('x_val', encoding='utf8')
y_val_key = bytes('y_val', encoding='utf8')
car_point = np.array([car_state.kinematics_true[position_key][x_val_key], car_state.kinematics_true[position_key][y_val_key], 0])
# Distance component is exponential distance to nearest line
distance = 999
#Compute the distance to the nearest center line
for line in road_points:
local_distance = 0
length_squared = ((line[0][0]-line[1][0])**2) + ((line[0][1]-line[1][1])**2)
if (length_squared != 0):
t = max(0, min(1, np.dot(car_point-line[0], line[1]-line[0]) / length_squared))
proj = line[0] + (t * (line[1]-line[0]))
local_distance = np.linalg.norm(proj - car_point)
distance = min(distance, local_distance)
distance_reward = math.exp(-(distance * DISTANCE_DECAY_RATE))
return distance_reward
Network architecture and transfer learning
If you decide to go the transfer learning route, you will notice that the initial behaviour of the car is much less random. It still won’t drive perfectly since one, our end-to-end model was not the best possible version of itself to begin with, and two, it has never seen elements like other cars, houses etc.
activation = 'relu'
# The main model input.
pic_input = Input(shape=(59,255,3))
train_conv_layers = False # For transfer learning, set to True if training ground up.
img_stack = Conv2D(16, (3, 3), name='convolution0', padding='same', activation=activation, trainable=train_conv_layers)(pic_input)
img_stack = MaxPooling2D(pool_size=(2,2))(img_stack)
img_stack = Conv2D(32, (3, 3), activation=activation, padding='same', name='convolution1', trainable=train_conv_layers)(img_stack)
img_stack = MaxPooling2D(pool_size=(2, 2))(img_stack)
img_stack = Conv2D(32, (3, 3), activation=activation, padding='same', name='convolution2', trainable=train_conv_layers)(img_stack)
img_stack = MaxPooling2D(pool_size=(2, 2))(img_stack)
img_stack = Flatten()(img_stack)
img_stack = Dropout(0.2)(img_stack)
img_stack = Dense(128, name='rl_dense', kernel_initializer=random_normal(stddev=0.01))(img_stack)
img_stack=Dropout(0.2)(img_stack)
output = Dense(5, name='rl_output', kernel_initializer=random_normal(stddev=0.01))(img_stack)
opt = Adam()
action_model = Model(inputs=[pic_input], outputs=output)
action_model.compile(optimizer=opt, loss='mean_squared_error')
action_model.summary()
Launch Local Training Job
先来定义超参数:
- batch_update_frequency: This is how often the weights from the actively trained network get copied to the target network. It is also how often the model gets saved to disk. For more details on how this works, check out the Deep Q-learning paper.
- max_epoch_runtime_sec: This is the maximum runtime for each epoch. If the car has not reached a terminal state after this many seconds, the epoch will be terminated and training will begin.
- per_iter_epsilon_reduction: The agent uses an epsilon greedy linear annealing strategy while training. This is the amount by which epsilon is reduced each iteration.
- min_epsilon: The minimum value for epsilon. Once reached, the epsilon value will not decrease any further.
- batch_size: The minibatch size to use for training.
- replay_memory_size: The number of examples to keep in the replay memory. The replay memory is a FIFO buffer used to reduce the effects of nearby states being correlated. Minibatches are generated from randomly selecting examples from the replay memory.
- weights_path: If we are doing transfer learning and using pretrained weights for the model, they will be loaded from this path.
- train_conv_layers: If we are using pretrained weights, we may prefer to freeze the convolutional layers to speed up training.
- airsim_path: The path to the folder containing the .ps1 to start AirSim. This path cannot contain spaces.
- data_dir: The path to the directory containing the road_points.txt and reward_points.txt used to compute the reward function. This path cannot contain spaces.
- experiment_name: A unique identifier for this experiment
Run Model
def append_to_ring_buffer(item, buffer, buffer_size):
if (len(buffer) >= buffer_size):
buffer = buffer[1:]
buffer.append(item)
return buffer
state_buffer = []
state_buffer_len = 4
print('Running car for a few seconds...')
car_controls.steering = 0
car_controls.throttle = 1
car_controls.brake = 0
car_client.setCarControls(car_controls)
stop_run_time =datetime.datetime.now() + datetime.timedelta(seconds=2)
while(datetime.datetime.now() < stop_run_time):
time.sleep(0.01)
state_buffer = append_to_ring_buffer(get_image(car_client), state_buffer, state_buffer_len)
print('Running model')
while(True):
state_buffer = append_to_ring_buffer(get_image(car_client), state_buffer, state_buffer_len)
next_state, dummy = model.predict_state(state_buffer)
next_control_signal = model.state_to_control_signals(next_state, car_client.getCarState())
car_controls.steering = next_control_signal[0]
car_controls.throttle = next_control_signal[1]
car_controls.brake = next_control_signal[2]
print('State = {0}, steering = {1}, throttle = {2}, brake = {3}'.format(next_state, car_controls.steering, car_controls.throttle, car_controls.brake))
car_client.setCarControls(car_controls)
time.sleep(0.1)