Ensemble Learning Algorithms

Share this post

Ensemble learning is a supervised learning technique used in machine learning to improve overall performance by combining the predictions from multiple models.

Each model can be a different classifier:

How does Ensemble Learning Work?

Ensemble learning works on the principle of the “wisdom of the crowd”. By combing multiple models, we can improve the accuracy of the predictions.


Subscribe to my Newsletter


Types of Ensemble Methods

  • Voting
  • Bootstrap aggregation (bagging)
  • Random Forests
  • Boosting
  • Stacked Generalization (Blending)

Voting

Voting is an ensemble machine learning algorithm that involves making a prediction that is the average (regression) or the sum (classification) of multiple machine learning models.

  • Same training sets
  • Different algorithms
  • sklearn.ensemble: VotingRegressor, VotingClassifier

Bootstrap Aggregation (Bagging)

Bagging, or bootstrap aggregation, is an ensemble method that reduces the variance of individual models by fitting a decision tree on different bootstrap samples of a training set.

  • Different training sets
  • Same algorithm
  • Two models from sklearn.ensemble: BaggingClassifier, BaggingRegressor

Random Forests

Random forests use decision trees as the base estimator for the predictions and improve the performance of models by calculating the majority voting / average prediction of multiple decision trees.

Random forest is both a supervised learning algorithm and an ensemble algorithm.

  • Base estimator is a decision tree
  • Each estimator uses a different bootstrap sample of the training set
  • Two models from sklearn.ensemble: RandomForestClassifier, RandomForestRegressor

Boosting

Boosting is an ensemble method that converts weak learners into strong learners by having each predictor fix the errors of its predecessor.

Boosting can be used in classification and regression problems.

Boosting machine learning algorithms work by:

  • Instantiating a weak learner (e.g. CART with max_depth of 1)
  • Making a prediction and passing the wrong predictions to the next predictor
  • Paying more and more attention at each iteration to the observations. having prediction errors
  • Making new prediction until the limit is reached or the higher accuracy is achieved.
Reproduced from Wikipedia

Multiple boosting Algorithms

  • Gradient Boosting: Gradient boosting machines, Gradient Boosted Regression Trees
  • Adaboost
  • XGBoost

Stacked Generalization (Blending)

Stacking, also known as Stacked Generalization, is an ensemble technique that improves the accuracy of the models by combining predictions of multiple classification or regression machine learning models.

Stacking machine learning algorithms work by:

  • Using multiple first level models to predict on a training set
  • Combining (stacking) the predictions to generate a new training set
  • Fitting and predicting a second level model on the generated training set
  • From sklearn.ensemble: StackingClassifier

Interesting Resources from the Community

Conclusion

This concludes the introduction of ensemble machine learning algorithms. We have covered how ensemble learning works and provided an overview of the most common machine learning models.

The next step is to learn how to use Scikit-learn to train each ensemble machine learning models on real data.