Multivariate linear regression
Linear regression with multiple variables is also known as “multivariate linear regression”.
Notation
Hypothesis
Gradient descent
Formula:
Partial derivation will be like this:
Feature scaling
If scale of features differ widely, it may take long time to converge. So it is necessary to scale the feature.
The contour will look like a circle after scaling.
Problem: Is converge result which is scaled the same as converge result which is not scaled.
If scale of feature is similar to [-1,1], it will be fine. When it is too big or too small, it should be scaled.
Mean normalization:
Learning rate
Check if gradient descent works correctly using this plot: min(J) - iteration number.
Alpha can’t be too small or too big, we can check it using descent rate plot.
Polynomial regression
We can combine multiple features into one, or use polynomial function.
Convert it into linear regression, but range of variables may differ widely, so it need feature scaling.
Computing parameters analytically
Normal Equation
Solve the optimal theta analytically just by one go.
Matrix setup:
Derivation of theta: 知乎专栏
Normal equation don’t need feature scaling:
When number of features is small, normal equation is better. Otherwise, gradient descent is better.
What if X^T X is not invertable.
pinv() in octave will give a value even if parameter is not invertable; int() is not.
The reason why X^T X is not invertable: 1. Redundant feature; 2. Too many features.