吴恩达深度学习课后习题第五课第一周编程作业2:Dinosaurus_Island_Character_level_language_model

这篇博客介绍了吴恩达深度学习课程中关于Dinosaurus Island字符级语言模型的编程作业。作业涉及问题陈述、数据预处理、模型概述、模型构建、梯度裁剪、采样等步骤,旨在训练一个能够预测恐龙名称的RNN模型。通过梯度裁剪避免梯度爆炸,并使用采样技术生成新的字符序列。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Packages

In [2]:
import numpy as np
from utils import *
import random
import pprint
import copy

1 - Problem Statement

1.1 - Dataset and Preprocessing

Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size.

In [3]:
data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))
There are 19909 total characters and 27 unique characters in your data.
  • The characters are a-z (26 characters) plus the "\n" (or newline character).
  • In this assignment, the newline character "\n" plays a role similar to the <EOS> (or "End of sentence") token discussed in lecture.
    • Here, "\n" indicates the end of the dinosaur name rather than the end of a sentence.
  • char_to_ix: In the cell below, you'll create a Python dictionary (i.e., a hash table) to map each character to an index from 0-26.
  • ix_to_char: Then, you'll create a second Python dictionary that maps each index back to the corresponding character.
    • This will help you figure out which index corresponds to which character in the probability distribution output of the softmax layer.
In [4]:
chars = sorted(chars)
print(chars)
['\n', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
In [5]:
char_to_ix = { ch:i for i,ch in enumerate(chars) }
ix_to_char = { i:ch for i,ch in enumerate(chars) }
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(ix_to_char)
{   0: '\n',
    1: 'a',
    2: 'b',
    3: 'c',
    4: 'd',
    5: 'e',
    6: 'f',
    7: 'g',
    8: 'h',
    9: 'i',
    10: 'j',
    11: 'k',
    12: 'l',
    13: 'm',
    14: 'n',
    15: 'o',
    16: 'p',
    17: 'q',
    18: 'r',
    19: 's',
    20: 't',
    21: 'u',
    22: 'v',
    23: 'w',
    24: 'x',
    25: 'y',
    26: 'z'}

1.2 - Overview of the Model

Your model will have the following structure:

  • Initialize parameters
  • Run the optimization loop
    • Forward propagation to compute the loss function
    • Backward propagation to compute the gradients with respect to the loss function
    • Clip the gradients to avoid exploding gradients
    • Using the gradients, update your parameters with the gradient descent update rule.
  • Return the learned parameters

Figure 1: Recurrent Neural Network, similar to what you built in the previous notebook "Building a Recurrent Neural Network - Step by Step."

  • At each time-step, the RNN tries to predict what the next character is, given the previous characters.
  • X=(x〈1〉,x〈2〉,...,x〈Tx〉)X=(x〈1〉,x〈2〉,...,x〈Tx〉) is a list of characters from the training set.
  • Y=(y〈1〉,y〈2〉,...,y〈Tx〉)Y=(y〈1〉,y〈2〉,...,y〈Tx〉) is the same list of characters but shifted one character forward.
  • At every time-step tt, y〈t〉=x〈t+1〉y〈t〉=x〈t+1〉. The prediction at time tt is the same as the input at time t+1t+1.

2 - Building Blocks of the Model

In this part, you will build two important blocks of the overall model:

  1. Gradient clipping: to avoid exploding gradients
  2. Sampling: a technique used to generate characters

You will then apply these two functions to build the model.

2.1 - Clipping the Gradients in the Optimization Loop

In this section you will implement the clip function that you will call inside of your optimization loop.

Exploding gradients

  • When gradients are very large, they're called "exploding gradients."
  • Exploding gradients make the training process more difficult, because the updates may be so large that they "overshoot" the optimal values during back propagation.

Recall that your overall loop structure usually consists of:

  • forward pass,
  • cost computation,
  • backward pass,
  • parameter update.

Before updating the parameters, you will perform gradient clipping to make sure that your gradients are not "exploding."

Gradient clipping

In the exercise below, you will implement a function clip that takes in a dictionary of gradients and returns a clipped version of gradients, if needed.

  • There are different ways to clip gradients.
  • You will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to fall between some range [-N, N].
  • For example, if the N=10
    • The range is [-10, 10]
    • If any component of the gradient vector is greater than 10, it is set to 10.
    • If any component of the gradient vector is less than -10, it is set to -10.
    • If any components are between -10 and 10, they keep their original values.

Figure 2: Visualization of gradient descent with and without gradient clipping, in a case where the network is running into "exploding gradient" problems.

Exercise 1 - clip

Return the clipped gradients of your dictionary gradients.

  • Your function takes in a maximum threshold and returns the clipped versions of the gradients.
  • You can check out numpy.clip for more info.
    • You will need to use the argument "out = ...".
    • Using the "out" parameter allows you to update a variable "in-place".
    • If you don't use "out" argument, the clipped variable is stored in the variable "gradient" but does not update the gradient variables dWaxdWaadWyadbdby.
In [6]:
# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
### GRADED FUNCTION: clip

def clip(gradients, maxValue):
    '''
 Clips the gradients' values between minimum and maximum.
    
 Arguments:
 gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
 maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue
    
 Returns: 
 gradients -- a dictionary with the clipped gradients.
 '''
    gradients = copy.deepcopy(gradients)
    
    dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
   
    ### START CODE HERE ###
    # Clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)
    for gradient in [dWax, dWaa, dWya, db, dby]:
        np.clip(gradient,-maxValue,maxValue,out=gradient)
    ### END CODE HERE ###
    
    gradients = {
          "dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
    
    return gradients
In [7]:
# Test with a max value of 10
def clip_test(target, mValue):
    print(f"\nGradients for mValue={
            mValue}")
    np.random.seed(3)
    dWax = np.random.randn(5, 3) * 10
    dWaa = np.random.randn(5, 5) * 10
    dWya = np.random.randn(2, 5) * 10
    db = np.random.randn(5, 1) * 10
    dby = np.random.randn(2, 1) * 10
    gradients = {
          "dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}

    gradients2 = target(gradients, mValue)
    print("gradients[\"dWaa
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值