Defined the loss, now we’ll have puro compute its gradient respect puro the output neurons of the CNN per order puro backpropagate it through the net and optimize the defined loss function tuning the net parameters. The loss terms coming from the negative classes are niente. However, the loss gradient respect those negative classes is not cancelled, since the Softmax of the positive class also depends on the negative classes scores.
The gradient expression will be the same for all \(C\) except for the ground truth class \(C_p\), because the score of \(C_p\) (\(s_p\)) is mediante the nominator.
- Caffe: SoftmaxWithLoss Layer. Is limited puro multi-class classification.
- Pytorch: CrossEntropyLoss. Is limited esatto multi-class classification.
- TensorFlow: softmax_cross_entropy. Is limited to multi-class classification.
In this Facebook sistema they claim that, despite being counter-intuitive, Categorical Ciclocross-Entropy loss, or Softmax loss worked better than Binary Cross-Entropy loss per their multi-label classification problem.
> Skip incontri daf this part if you are not interested durante Facebook or me using Softmax Loss for multi-label classification, which is not standard.
When Softmax loss is used is a multi-label sfondo, the gradients get a bit more complex, since the loss contains an element for each positive class. Consider \(M\) are the positive classes of verso sample. Read More