# Machine Learning: Week 2

## Multivariate linear regression

Linear regression with multiple variables is also known as “multivariate linear regression”.

### Notation ### Hypothesis Formula: Partial derivation will be like this: ### Feature scaling

If scale of features differ widely, it may take long time to converge. So it is necessary to scale the feature.

The contour will look like a circle after scaling.

Problem: Is converge result which is scaled the same as converge result which is not scaled. If scale of feature is similar to [-1,1], it will be fine. When it is too big or too small, it should be scaled. Mean normalization: ### Learning rate

Check if gradient descent works correctly using this plot: min(J) - iteration number. Alpha can’t be too small or too big, we can check it using descent rate plot. ### Polynomial regression

We can combine multiple features into one, or use polynomial function.

Convert it into linear regression, but range of variables may differ widely, so it need feature scaling. ## Computing parameters analytically

### Normal Equation

Solve the optimal theta analytically just by one go.

Matrix setup: Derivation of theta: 知乎专栏 Normal equation don’t need feature scaling: When number of features is small, normal equation is better. Otherwise, gradient descent is better. What if X^T X is not invertable.

pinv() in octave will give a value even if parameter is not invertable; int() is not. The reason why X^T X is not invertable: 1. Redundant feature; 2. Too many features. 