Kaggle Evaluation Metrics Used for Regression Problems

X:Kaggle Evaluation Metrics Used for Regression Problems X:Comprehensive Guide to Interpreting R², MSE, and RMSE for Regression Models. X:metrics_for_evaluation_in_regression_problems

Absolute Error - AE
Mean Absolute Error - MAE
Weighted Mean Absolute Error - WMAE
Pearson Correlation Coefficient
Spearman’s Rank Correlation
Root Mean Squared Error - RMSE
Root Mean Squared Logarithmic Error - RMSLE
Mean Columnwise Root Mean Squared Error - MCRMSE
References

While crafting machine learning model there is always a need to asses its performance. When trying multiple models or hyper parameter tuning it is useful to compare different approaches and choose the best one. The sklearn.metrics provides plethora of metrics for suitable for distinct purposes.

In this series of posts I will discuss four groups of common machine learning tasks each requires specific metrics:

Regression - predict value of one or more variables that are continuous, e.g. predict stock price of given asset or predict temperature for next day.
Binary classification - assign sample to one of two classes - example: classify image as one containing "cat" or "dog"
Multiple class classification - assign sample to one of many classes example: classify new article to category "sport", "politics", "economy", "pop-culture",...
Other

The Kaggle competitions give insight into approach taken by Kaggle team to select best evaluation metrics for given task. There used to be Kaggle wiki under containing short definitions of metrics used in Kaggle competitions but it is not available anymore. In this post we will look closer at the first group and explain few model evaluation metrics used in regression problems.

Absolute Error - AE

The sum of the absolute value of each individual error.

$$ \mathrm{AE} = \sum_{i=1}^n | y_i - \hat{y}_i | $$

Where:

$\mathrm{AE} = |e_i| = |y_i-\hat{y_i}|$,

$n$ - number test of samples,

$y_i$ - actual variable value,

$\hat{y}_i$ - predicted variable value.

MAE can cause notable difference between public and private leaderboard calculations. One drawback of the Absolute Error metrics is that direct comparison of the metrics for model used to predict variables on different scales is not possible. E.g. when using model to financial predictions of S&P 500 index and using the same model to predict value of Microsoft stock price we cannot compare their performance using this metrics since units and ranges are different. The S&P 500 is expressed in points and stock price of asset is expressed in dollars. In this situation one can use (percentage error) to get evaluation metrics in common scale.

Exemplary competition using Mean Absolute Error for model evaluation:

Forecast Eurovision Voting - This competition requires contestants to forecast the voting for this year's Eurovision Song Contest in Norway on May 25th, 27th and 29th.

Mean Absolute Error - MAE

Mean of the absolute value of each individual error.

The mean absolute error (MAE) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes. The mean absolute error is given by formula:

$$ \mathrm{MAE} = \frac{1}{n}\sum_{i=1}^n \left| y_i - \hat{y_i}\right| =\frac{1}{n}\sum_{i=1}^n \left| e_i \right|. $$

Where:

$n$ - number test of samples,

$y_i$ - actual variable value,

$\hat{y}_i$ - predicted variable value.

see also paper: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance

Five exemplary competitions using Mean Absolute Error for model evaluation:

LANL Earthquake Prediction - Can you predict upcoming laboratory earthquakes?
PUBG Finish Placement Prediction - Can you predict the battle royale finish of PUBG Players?
Allstate Claims Severity - How severe is an insurance claim?
Loan Default Prediction - Imperial College London - Constructing an optimal portfolio of loans.
Finding Elo - Predict a chess player's FIDE Elo rating from one game.

Weighted Mean Absolute Error - WMAE

Weighted average of absolute errors.

WMAE can be used as evaluation tool for better assessing the model performance with respect to the goals of the application. For example, in the case of recommending books or movies it could be possible that the accuracy of the predictions varies when focusing on past or recent products. In this situation, it is not reasonable that every error were treated equally, so more stress should be put in recent items.

WMAE can be also useful as a diagnosis tool that, using a "magnifying lens", can help to identify those cases where an algorithm is having trouble with. The formula for calculating WMAE is:

$$ \textrm{WMAE} = \frac{1}{n} \sum_{i=1}^n w_i | y_i - \hat{y}_i |, $$

where:

$n$ - number test of samples,

$w_i$ - weights for sample $i$,

$y_i$ - actual variable value,

$\hat{y}_i$ - predicted variable value.

Two exemplary competitions using Weighted Mean Absolute Error for model evaluation:

The Winton Stock Market Challenge - Join a multi-disciplinary team of research scientists.
Walmart Recruiting - Store Sales Forecasting - Use historical markdown data to predict store sales.

Pearson Correlation Coefficient

Covariance of the two variables divided by the product of the standard deviation of each data sample.

It is the normalization of the covariance between the two variables to give an interpretable score. The Pearson correlation coefficient can be used to summarize the strength of the linear relationship between two data samples. The formula for calculating Pearson correlation coefficient is:

$$ p = \frac{cov(y_i, \hat{y}_i)}{std(y_i) std(\hat{y}_i)} $$

where:

$cov()$ - is covariation function,

$std()$ - is standard deviation

$y_i$ - actual variable value,

$\hat{y}_i$ - predicted variable value

$p$ - Pearson correlation coefficient.

The use of mean and standard deviation in the calculation requires data samples to have a Gaussian or Gaussian-like distribution.

Exemplary competition using Pearson Correlation Coefficient for model evaluation:

Merck Molecular Activity Challenge - Help develop safe and effective medicines by predicting molecular activity.

Spearman’s Rank Correlation

Covariance of the two variables converted to ranks divided by the product of the standard deviation of ranks for each variable.

Two variables may be related by a nonlinear relationship, such that the relationship is stronger or weaker across the distribution of the variables. The two variables being considered may have a non-Gaussian distribution.

The Spearman’s correlation coefficient can be used to summarize the nonlinear relation between the two data samples. Raw scores $y_i$ and $\hat{y}_i$ are converted to ranks respectively: $ry_i$ and $\hat{ry}_i$. The formula for calculating Spearman's rank correlation coefficient is:

$$ r=\frac{cov(ry_i, \hat{ry}_i)}{std(ry_i)std(\hat{ry}_i)} $$

where:

$cov()$ - is covariation function,

$std()$ - is standard deviation,

$ry_i$ - rank of variable value,

$\hat{ry}_i$ - rank of predicted variable value,

$r$ - Spearman's correlation coefficient.

Exemplary competition using Spearman’s Rank Correlation for model evaluation:

Draper Satellite Image Chronology](https://www.kaggle.com/c/draper-satellite-image-chronology#evaluation) - Can you put order to space and time?

Root Mean Squared Error - RMSE

The square root of the mean/average of the square of all of the error.

The use of RMSE is very common and it makes an excellent general purpose error metric for numerical predictions. Compared to the similar Mean Absolute Error, RMSE amplifies and severely punishes large errors. The formula for calculating RMSE is:

$$ \mathrm{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} $$

where:

$n$ - number test of samples,

$y_i$ - actual variable value,

$\hat{y}_i$ - predicted variable value.

Five exemplary competition using Root Mean Squared Error for model evaluation:

Elo Merchant Category Recommendation - Help understand customer loyalty.
Google Analytics Customer Revenue Prediction - Predict how much GStore customers will spend.
House Prices: Advanced Regression Techniques - Predict sales prices and practice feature engineering, RFs, and gradient boosting.
Predict Future Sales - Final project for "How to win a data science competition" Coursera course.
New York City Taxi Fare Prediction - Can you predict a rider's taxi fare?

Root Mean Squared Logarithmic Error - RMSLE

Root mean squared error of variables transformed to logarithmic scale.

$$ \mathrm{RMSLE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(log(\hat{y}_i + 1) - log(y_i + 1))^2} $$

Where:

$n$ - number of test samples,

$\hat{y}_i$ is the predicted variable,

$y_i$ is the actual variable,

$log(x)$ is the natural logarithm of $x$.

The RMSLE is higher when the discrepancies between predicted and actual values are larger. Compared to Root Mean Squared Error (RMSE), RMSLE does not heavily penalize huge discrepancies between the predicted and actual values when both values are huge. In this cases only the percentage differences matter (difference of variable logarithms is equivalent to ratio of variables).

Exemplary competition using Root Mean Squared Logarithmic Error for model evaluation:

Santander Value Prediction Challenge - Predict the value of transactions for potential customers.
Mercari Price Suggestion Challenge - Can you automatically suggest product prices to online sellers?
Recruit Restaurant Visitor Forecasting - Predict how many future visitors a restaurant will receive
New York City Taxi Trip Duration - Share code and data to improve ride time predictions
Sberbank Russian Housing Market - Can you predict realty price fluctuations in Russia’s volatile economy?

Mean Columnwise Root Mean Squared Error - MCRMSE

Errors of each k-fold CV trials were averaged over n test samples across m target variables.

$$ MCRMSE = \frac{1}{m}\sum_{j=1}^{m}\sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_ {ij}-\hat{y}_{ij})^2} $$

Note that expression under square root is RMSE, thus we can write:

$$ MCRMSE = \frac{1}{m}\sum_{j=1}^{m}RMSE_j $$

Where:

$m$ - number of predicted variables,

$n$ - number of test samples,

$y_{ij}$ - $i$-th actual value of $j$-th variable,

$\hat{y}_{ij}$ - $i$-th predicted value of $j$-th variable.

Exemplary competition using Mean Columnwise Root Mean Squared Error for model evaluation:

Africa Soil Property Prediction Challenge - Predict physical and chemical properties of soil using spectral measurements

References

Any comments or suggestions? Let me know.

Absolute Error - AE

Mean Absolute Error - MAE

Weighted Mean Absolute Error - WMAE

Pearson Correlation Coefficient

Spearman’s Rank Correlation

Root Mean Squared Error - RMSE

Root Mean Squared Logarithmic Error - RMSLE

Mean Columnwise Root Mean Squared Error - MCRMSE

References

You might also like