Machine Learning: Week 1

Supervised learning

Predict output according to the correct dataset which are already known.

Supervised learning problems are categorized into “regression” and “classification” problems

House price prediction

This is a regression problem, whose prediction value is continuous.

Do regression according to history data, you can do linear regression or quadratic regression which is up to you.

Then do prediction using this result.


Breast cancer

This is a classification problem, whose prediction value is discrete.

Predict if tumor is benign or malignant according to tumor size. This is a classification problem.


When there are two feature: tumor size and age, it will be like this:


Unsupervised learning

Find data structure from given dataset which do not have right answer.

Clustering problem

Cluster these data into two parts. image

An application: google news will cluster different news from different site but with same topic together. image

Application of Unsupervised learning.

  • Cluster the servers to make it more efficient
  • Cluster the user in social network
  • Market segmentation
  • Astronomical data analysis image

Cocktail party: Identify the voice from a mesh of sounds in chaotic environment. image

Model and Cost Function




why use hypothesis to represent function h: Former research used it, maybe it is not the best choice, but it just a terminology. image

Cost Function

theta 1 and theta 0 are parameters of function h. image

Something I don’t know before:
#: hash sign, represent the number of something.
1/m: 1 over | the 2m

The figure as follow shows the detail of cost function J of linear regression function, which can be call squared error function.

The mean is halved (1/2m) as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 12 term image

Intuition of cost function:

One parameter: image

Two parameters: image

Parameter learning

Gradient Descent

Init theta 0 and theta 1, change these two values till J reaches local minimum value. image

Different start position can lead to different local optimal.


:= is assignment; = is truth assertion.
Right side of equation should be updated simultaneously. image

Gradient Descent Intuition

Case of one parameter: image image

The step will be smaller and smaller, so there is no need to decrease alpha. image

Gradient Descent for Linear Regression

Calculate partial derivation of J: image

Gradient descent algorithm for linear regression: image

Process of descent for linear regression image

Batch gradient descent: calculate all data in every update, high cost: image