Today, we will implement a loss and accuracy function to be able to observe how our network is performing in its currently untrained state. This is the last step before we move on to optimization and training. Training involves doing multiple forward and backward passes, adjusting weights and biases, and monitoring loss and accuracy. If loss goes down and accuracy goes up after each pass, the network is actually learning from the data.
For multi-class classification tasks like what we are trying to achieve, Categorical Crossentropy (CC) is the most popular loss function. In the case of binary classification with 2 classes, that would be Binary Crossentropy.
These functions analyze the results from the Softmax activation function and compare them to ground truths - the `y` in the dataset. When confidence levels in the Softmax output equals to 1 (when the network is 100% sure about the predictions), loss equals to 0, and vice versa.
Imagine Softmax outputs that look like the following:
let softmaxOutputs = [[0.7, 0.1, 0.2], [0.1, 0.5, 0.4], [0.02, 0.9, 0.08]]
...and the `y` values for that batch of outputs are the following (representing cat, dog, and dog):
let classTargets = [0, 1, 1]
In this case, we have 2 classes: 0 - cat, 1 - dog. If we encode the classes this way (and we do in the sample spiral data that is used in this tutorial series), these classes actually are indexes in the Softmax output that we can use to retrieve corresponding predictions from. In order to ensure consistency, we need to encode the labels appropriately at the data cleaning stage.
We get [0.7, 0.5, 0.9] - the model is 70% sure that the first sample is a dog, 50% sure that the second sample is a cat and so on.
Categorical cross entropy is basically the negative logarithm of each confidence level, or -log(x). And "loss" is the average (mean) negative logarithm across samples. What would be the loss and accuracy for the said sample of 3?
First, we get the confidences.
var confidencesList = ArrayList<Float64>([])
for ((targIdx, distribution) in classTargets |> zip(softmaxOutputs)) {
confidencesList.append(distribution[targIdx])
}
>>> [0.700000, 0.500000, 0.900000]
Second, we calculate the negative logarithm from the confidences array. Because -log(0) is `inf` (infinity) and -log(1) results in a negative value (and loss cannot be negative), we need to ensure that our confidences are never 0 or 1, otherwise the network throws an error. What we can do is `clip` our values by adding a very small number to a prediction if it is 0 and subtracting the same number from a prediction if it is 1. The book recommends to use 1e-7, which is 0.0000001.
let negLog = confidencesList |> map {i => -log(clamp(i, 1e-7, 1.0 - 1e-7))} |> collectArray
>>> [0.356675, 0.693147, 0.105361]
Because w