Stanford CS231n: Deep Learning for Computer Vision

Notes

Semantic gap is the difference of how human and model see, learn or take actions.
- In Vision variations in scale, viewpoint, fine-grained categories, occlusion, clutter, deformation, , illumination affect the learning more than it would affect a human
Hand modeled features like edge, corner, image transformations does not scale to wide variety of applications so we learn from data instead.
L1 and L2 distances are used to compare how the input vector differs from the training data. L2 distance is less forgiving for larger differences due to the squared term. Also when programming we can neglect the sqrt since it is a monotonic function and we can just used the indexes of the distances to get first k values
We split the data into training, validation and test sets. Validation set is used for hyper parameter tuning. Training set is used for learning the model with a certain hyper parameter. Finally test set is used to see the performance of the model in the wild.
Cross validation technique is used to divide the dataset into multiple folds. Then we tune the hyper parameter across multiple folds and use the average of the values as our hyper parameter.
Cross validation can be computationally expensive so we may just split the data so that validation has more samples in it. But if it is small then we can just do the cross validations. Typically we use a 3,5,10 fold of data for cross validation. Also if the number of hyper parameters are large we typically use bigger validations sets than using cross validation
Since k-NN is not feasible we try out the next approach which is a parametric approach
- Algebraic viewpoint
  - $f(x_i, W, b) = W x_i + b$
  - Input image 2X2 x → (4,)
  - W (3,4)
  - b (3,)
  - Given W is [KD] and xi is [D1], Wxi evaluates K categories at a time. Each row corresponds to a classifier
- Template matching
  - One template per class
  - Kind of like k-NN but here we compare to one image per class which is learned with the input data
  - We use negative inner product instead of L1 or L2 distances
- Geometric Viewpoint
  - Hyperplanes cutting up space
  - Bias vector
    - Wxi allows the rotation of the weighted sum of the classifier while the bias term allows the classifier to be shifted so that it better fits into various data distribution
    - Influences the output without actually interacting with the input data
    - If input xi is 0 then Wxi = 0 which forces every single line or classifier to go through zero and is not representative, but with the bias term we avoid that scenario.
- Now that we can score we need 2 things
  - Loss function to see good or bad the score is
  - Optimization function to change parameters to lower the loss
Linear classification has 3 main component to it.
- Score function which maps input data to class scores
- Loss function which computes the error between the ground truth values and the predicted values
- Optimization process which minimizes the loss with respect the parameters of the score function
Normalize data - Center your data by subtracting mean from every feature
SVM Loss
- SVM Loss wants the correct class score to be higher than the incorrect class score by a margin of at least $\Delta$
- $L_i = \sum_{j\neq y_i} \max(0, s_j - s_{y_i} + \Delta)$
- If all the scores are random say small values what is the loss? K-1
  - Use this intuition of loss checking to verify your setup
- So if the difference is greater than $-\Delta$ then we don’t need to worry about the parameters for this class because it is not crossing the classfiers margin. Not making the classifier unhappy
- So if $s_{y_i}==10$
Regularization
- It avoids overfitting by adding a penalty to the loss functions
- It prevents the model to generalize better
- It prevents the model to learn the noise in data which leads to overfitting or heavily relying on specific features
Softmax Classifier is a generalization of binary logistic regression classifier to multiple classes.
SVM Classifier’s output is difficult to interpret because the score is no normalized and does not provide an intuition of the measurement across various classes. But softmax classifier fixes this using normalized class probabilities.

Notes

Labs