Machine Learning: Week 2

Multivariate linear regression

Linear regression with multiple variables is also known as “multivariate linear regression”.

Notation

image

Hypothesis

image

Gradient descent

Formula:

image

Partial derivation will be like this:

image

Feature scaling

If scale of features differ widely, it may take long time to converge. So it is necessary to scale the feature.

The contour will look like a circle after scaling.

Problem: Is converge result which is scaled the same as converge result which is not scaled.

image

If scale of feature is similar to [-1,1], it will be fine. When it is too big or too small, it should be scaled.

image

Mean normalization:

image

Learning rate

Check if gradient descent works correctly using this plot: min(J) - iteration number.

image

Alpha can’t be too small or too big, we can check it using descent rate plot.

image

Polynomial regression

We can combine multiple features into one, or use polynomial function.

Convert it into linear regression, but range of variables may differ widely, so it need feature scaling.

image

Computing parameters analytically

Normal Equation

Solve the optimal theta analytically just by one go.

Matrix setup:

image

Derivation of theta: 知乎专栏 image

Normal equation don’t need feature scaling:

image

When number of features is small, normal equation is better. Otherwise, gradient descent is better. image

What if X^T X is not invertable.

pinv() in octave will give a value even if parameter is not invertable; int() is not. image

The reason why X^T X is not invertable: 1. Redundant feature; 2. Too many features. image