Learning Principle
Learning Principle
In machine learning, learning principle refer to the standard for judging whether a model is good or not
1. Loss function, Risk function and Objective function
Loss function is a function used to evaluate the fitness of a model
For supervised learning, loss function evaluate the difference between the predicted output and true output, noted as \(L(Y,f(x, \theta))\). Ususally, loss function is applied to a single sample or a part of the samples, it cannot evaluate the overall performance of the model. To obtain that, we define: \[ R_{exp}(\theta) = E_P[L(Y,f(x, \theta))] = \int_{X*Y}L(Y,f(x, \theta))P(X,Y)dxdy \] Where \(R_{exp}\) is called risk function or expected loss
To convert a ML problem into an optimization problem, a ideal practice is to adopt risk function as the objective function for the optimization program. However, this require us to know the true JPD P(X, Y), which is usually unknown in real problem(we can only know the estimated JPD through observational dataset), this is called an ill-formed problem
The objective function a concept in the context of optimization. It refer to the function the optimizer is minimizing. In many case, the objective function in the model is the same as the risk function. However, there also exist situation that we cannot call a risk function a objective function(e. g. Boosting, PCA).
2. Empirical Risk Minimization
An obvious solution for ill-formed problem is to replace ture P(X,Y) with the observed \(\hat{P(X,Y)}\) on training dataset.
Suppose the weight of all sample are equivalent, we define \[ R_{emp}(\theta) = \frac{1}{N}\sum_{n=1}^N L(y, f(x,\theta) \] Where \(R_{emp}\) is called empirical risk.
When we use empirical risk as our objective function, we call the learning principle of the machine learning "ERM"
For a probability model, under some condition, we can consider ERM as equivalent to a Maximum Likehood Estimation(MLE). Refer to another article about parameter estimation
3. Structural Risk Minimization
When sample size is big enough, empirical risk would be close enough to the real expected risk. However, in real problem we will not have infinite samples. We would probably obtain a subset of the sample with unmeasured varaible and noise. Such situation would often lead to overfitting. In such case, we need introduce regularization: \[ R_{srm}(\theta) = R_{emp} + \lambda J(\theta) \] where \(J(\theta)\) is a function represent the complexity of the model, and \(\lambda\) is a penalized parameter used the control the degree of regularization
When we use structural risk as our objective function, we call the learning principle of the machine learning "SRM"
For a probability model, under some condition, we can consider SRM as equivalent to a Maximum-A-Posterior(MAP). Refer to another article about parameter estimation
4. Objective function for Unsupervised Learning
For unsupervised learning, although we can introduce regularization techniques, we normally want directly classify unsupervised learning models into a certain learning principle. We instead emphasize the objective function of their optimization process. Objective functions for specific algorithm can be found in corresponding articles.