Multivariate linear regression
Linear regression with multiple variables is also known as “multivariate linear regression”.
Partial derivation will be like this:
If scale of features differ widely, it may take long time to converge. So it is necessary to scale the feature.
The contour will look like a circle after scaling.
Problem: Is converge result which is scaled the same as converge result which is not scaled.
If scale of feature is similar to [-1,1], it will be fine. When it is too big or too small, it should be scaled.
Check if gradient descent works correctly using this plot: min(J) - iteration number.
Alpha can’t be too small or too big, we can check it using descent rate plot.
We can combine multiple features into one, or use polynomial function.
Convert it into linear regression, but range of variables may differ widely, so it need feature scaling.
Computing parameters analytically
Solve the optimal theta analytically just by one go.
Derivation of theta: 知乎专栏
Normal equation don’t need feature scaling:
When number of features is small, normal equation is better. Otherwise, gradient descent is better.
What if X^T X is not invertable.
pinv() in octave will give a value even if parameter is not invertable; int() is not.
The reason why X^T X is not invertable: 1. Redundant feature; 2. Too many features.