This website’s goal is to develop and explain a data science philosophy – overkill analytics – that leverages computing scale and rapid development technologies to produce faster, better, and cheaper solutions to predictive modeling problems. To achieve this goal, one core question must be answered: when attacking data science problems, how can we use CPU as a substitute for IQ? This post will discuss the fundamental ‘overkill’ weapon for addressing this question – ensemble learning.
Ensembles are nothing new , of course; they underlie many of the most popular machine learning algorithms (e.g., random forests and generalized boosted models ) . The theory is that consensus opinions from diverse modeling techniques are more reliable than potentially biased or idiosyncratic predictions from a single source. More broadly, this principle is as basic as “two heads are better than one.” It’s why cancer patients get second opinions, why the Supreme Court upheld affirmative action , why n
Well, maybe the principle isn’t universally applied. Still, it is fundamental to many disciplines and holds enormous value for the data scientist. Below, I will explain why, by addressing the following: