In Machine Learning we can predict the model using two-approach, The first one is overfitting and the second one is Underfitting.
When we predicting the model then we need some information so that we can predict the model, if data is has a lot of information or features which is very or near accurate then model confuse to predict the model this condition is called the overfitting, and if the number of features is not more or we can say that incomplete information is given then model not predict real value.
Overfitting
It is used to models the training data too well.
In overfitting, a large number of the feature are given then model not predict the accurate result.
Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model. The problem is that these concepts do not apply to new data and negatively impact the models' ability to generalize.
Overfitting is more likely with nonparametric and nonlinear models that have more flexibility when learning a target function. As such, many nonparametric machine learning algorithms also include parameters or techniques to limit and constrain how much detail the model learns.
For example:
If we want to predict the round object like a ball. Then some information is required, like
shape
radius
Eating
play
Here lots number of features are given like:
shape: We can predict the model using shape if the shape is round then the model predicts that this is the ball.
radius: We can predict the model using the radius but the radius is different for different size balls like a cricket ball and tennis ball. So that model confuses to predict the model.
And more.
How handle the problem of overfitting
Both overfitting and underfitting can lead to poor model performance. But by far the most common problem in applied machine learning is overfitting.
Overfitting is such a problem because the evaluation of machine learning algorithms on training data is different from the evaluation we actually care the most about, namely how well the algorithm performs on unseen data.
There are two important techniques that you can use when evaluating machine learning algorithms to limit overfitting:
Use a resampling technique to estimate model accuracy.
Hold back a validation dataset.
The most popular resampling technique is k-fold cross-validation. It allows you to train and test your model k-times on different subsets of training data and build up an estimate of the performance of a machine learning model on unseen data.
Underfitting:
Underfitting means, incomplete information or the minimum number of features are given.
Like if the only shape is given then model faces the problem to identify that it is a ball or any round shape fruit.
For Example:
If we predict the round shape object like orange at this situation model, not identify that this is ball or orange.
Conclusion:
Overfitting: Good performance on the training data, poor generalization to other data.
Underfitting: Poor performance on the training data and poor generalization to other da