Random Forests | NRT Online Library for Data Science and Human Behavior

Edit me

Note: Please utilize the template below as a reference for your contribution. Adapt the template when deemed necessary

What is TOPIC-NAME?

A random forest is an “ensemble” method made up of a collection of decision-tree models. In general, random forest models make reasonable predictions across a wide range of settings, while also being pretty simple to set up and get running. The key to their success is their use of “bagging,” in which each of the decision trees in the model is fit to a subset of the training data chosen with replacement (i.e., some data point may be included multiple times in a particular decision tree’s training set, while another point not at all). Then, the random forest simply predicts the average (for regression) or majority class (for classification) of all the individual decision-tree predictions.

Recommended Path for Learning

Item 1 (video/code tutorial/document)
Item 2 (video/code tutorial/document)
Item 3 (video/code tutorial/document)

Further Learning

Video

Here’s a pretty accessible introduction to random forests. https://youtu.be/J4Wdy0Wc_xQ

Here is an introduction video to Random Forests with python example from Coursera. What is a Random Forest and How is it “Grown”?
This video focuses more on the techniques and methods than on the statistics behind these methods. Random Forests

Interesting Articles

Introduction to Random forest. This post is a good overview of Random forest. It explains what a decision tree is and then it talks about the random forest. Before it clarifies the Pros and Cons of random forest and its applications, it shows an implementation and explanation of the random forest in R.
A Guide To Random Forests: Consolidating Decision Trees. This link includes some information about Random Forest, disadvantages of decision trees, the difference between decision trees and random forests, and Coding the Algorithm.

Online tutorials

Understanding Random Forests Classifiers in Python. This Datacamp tutorial explains random forest in an easy way to understand. Then it clarifies the differences between random forest and a decision tree and elaborates on the advantage and this advantage of each of them. Then it shows a code example using Scikit learn in python to classify iris flowers using random forest. The data-set they use is already embedded in Scikit learn package.
An Implementation and Explanation of the Random Forest in Python. This Towards Data Science tutorial is an implementation and explanation of the random forest in python. It starts by explaining the decision tree and it explains the random forest. Then it implements it in python using jupyter notebook.
Random forests code in python [github]. A python code in jupyter notebook. This notebook starts with a single decision tree and a simple problem, then builds to a random forest and a real-world problem.

Theory papers

This Science Direct paper presents a weed/maize classification and find the most important features using random forest. Recognising weeds in a maize crop using a random forest machine-learning algorithm and near-infrared snapshot mosaic hyperspectral imagery
In this paper they described a methodology for authorship profiling. They used random forests model for classification and regression. A Random Forest Approach for Authorship Profiling

Tags: