The Multiclass SVM loss for the i-th example is then formalized as follows:
The most common regularization penalty is the L2 norm that discourages large weights through an elementwise quadratic penalty over all parameters:
The full Multiclass SVM loss becomes:
Gradient:
def svm_loss_vectorized(W, X, y, reg): """ Structured SVM loss function, vectorized implementation . Inputs have dimension D, there are C classes, and we operate on minibatches of N examples. Inputs: - W: A numpy array of shape (D, C) containing weights. - X: A numpy array of shape (N, D) containing a minibatch of data. - y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C. - reg: (float) regularization strength Returns a tuple of: - loss as single float - gradient with respect to weights W; an array of same shape as W """ loss = 0.0 dw = np.zeros(W.shape) num_train = X.shape[0] # num_train : N scores = X.dot(W) # scores.shape = (N,C) scores_correct = scores[np.arange(num_train),y] # 1 by N scores_correct = np.reshape(scores_correct,(-1,1)) # N by 1 margins = scores - scores_correct + 1.0 # delta = 1.0 margins[np.arange(num_train),y] = 0 # j != yi margins[margins < 0] = 0 #max( 0 , s_j - s_yi + delta) regular_loss = reg * np.sum(W**2) loss = 1/num_train * np.sum(margins) + regular_loss margins[margins>0] = 1.0 row_sum = margins.sum(axis = 1) # 1 by N , sum in row margins[np.arange(num_train),y] = -row_sum dw = (X.T).dot(margins)/num_train + regular_loss #D by C : D*N * N*C return loss,dw
reference : http://cs231n.github.io/optimization-1/