Tools: Decision Trees – The Unreasonable Power Of Nested Decision Rules

Tools: Decision Trees – The Unreasonable Power Of Nested Decision Rules

Let's pretend we're farmers with a new plot of land. Given only the Diameter and Height of a tree trunk, we must determine if it's an Apple, Cherry, or Oak tree. To do this, we'll use a Decision Tree.

Almost every tree with a Diameter ≥ 0.45 is an Oak tree! Thus, we can probably assume that any other trees we find in that region will also be one. This first decision node will act as our root node. We'll draw a vertical line at this Diameter and classify everything above it as Oak (our first leaf node), and continue to partition our remaining data on the left.

We continue along, hoping to split our plot of land in the most favorable manner. We see that creating a new decision node at Height ≤ 4.88 leads to a nice section of Cherry trees, so we partition our data there. Our Decision Tree updates accordingly, adding a new leaf node for Cherry.

After this second split we're left with an area containing many Apple and some Cherry trees. No problem: a vertical division can be drawn to separate the Apple trees a bit better. Once again, our Decision Tree updates accordingly.

The remaining region just needs a further horizontal division and boom - our job is done! We've obtained an optimal set of nested decisions. That said, some regions still enclose a few misclassified points. Should we continue splitting, partitioning into smaller sections? Hmm...

If we do, the resulting regions would start becoming increasingly complex, and our tree would become unreasonably deep. Such a Decision Tree would learn too much from the noise of the training examples and not enough generalizable rules. Does this ring familiar? It is the well known tradeoff that we have explored in our explainer on The Bias Variance Tradeoff! In this case, going too deep results in a tree that overfits our data, so we'll stop here. We're done! We can simply pass any new data point's Height and Diameter values through the newly created Decision Tree to classify them as either an Apple, Cherry, or Oak tree!

Source: HackerNews