The following pages and posts are tagged with

TitleTypeExcerpt
AdaBoost Page * Short for Adaptive Boosting, AdaBoost is another ensemble algorithm. ## Differences with decision trees * In random forests, we built complete trees each time, but in AdaBoost, each tree only consists of a node and two leaves (which is called a stump). * Another difference to RF is that...
Decision trees Page * Decision trees are one of the most popular ML algorithm. * They can be used for regression and classification. * They categorize data in a similar way to human thinking. * Therefore, they are also easy to understand and interpret. * Succinctly, in a decision tree, each node represents...
Gradient-boosted trees Page ## Gradient-boosted trees * This algorithm can also be used for regression and classification. * It builds trees one after another, each new tree fixing the problems of the previous one. * It involves no randomization by default. * It uses shallow trees (maximum depth about 5). Therefore requires less...
K-Nearest Neighbors (KNN) Page * KNN is a nonparametric learning algorithm, i.e., it does not make any assumptions about the structure of the data. * It is a distance based majority vote algorithm. It checks the category of k-nearest neighbors of a data sample, and assigns it to the category of the majority of...
Linear regression Page ## Ordinary linear regression * Least squares is a widely used method for finding the "line of the best fit". * Fitting a line through the cloud of points will result in the line not passing through many of the points. * What is the best way...
Naive Bayes Page * Naive Bayes is a probabilistic algorithm that considers the features as _independent_, hence the term naive. * Generally, it is similar to linear models, but its faster to train, and is worse in generalization. * There are three kinds of NB: Gaussian, Bernoulli, and multinomial. *...
Neural Networks Page ## What is a neural network * A "neuron" in a neural network is a function. It accepts some inputs, applies some calculations on them, and then returns a single number. * For regression...
Random Forests Page ## Ensemble algorithms * Random forests is an ensemble algorithm. * They combine multiple ML algorithms to create a more powerful one. * In competitions, ensemble algorithms are usually the winners. * Two most common ensemble algorithms are _random forests_ and _gradient boosted decision trees_. ## Random forests * A...
Ridge and LASSO regressions Page ## Ridge * A ridge regression is similar to a linear regression but there are more constraints for choosing the coefficients in addition to best fitting the data. * The new cost function is $$ C = \frac{1}{2m} (\sum_{i=1}^{m} (y_i - a_1x_i - a_0)^2 + \lambda \sum_{j=1}^{n} a_j^2) $$, where...