R-squared is a statistical measure that represents the proportion of the variance in the dependent variable explained by the independent variables in a regression model.
R-squared, also known as the coefficient of determination, is a widely used metric in machine learning and statistics to evaluate the performance of regression models. It quantifies the proportion of the variance in the dependent variable that can be explained by the independent variables in the model. R-squared values range from 0 to 1, with higher values indicating a better fit of the model to the data.
Recent research on R-squared has explored various aspects and applications of this metric. For instance, a non-inferiority test for R-squared with random regressors has been proposed to determine the lack of association between an outcome variable and explanatory variables. Another study introduced a generalized R-squared (G-squared) for detecting dependence between two random variables, which is particularly effective in handling nonlinearity and heteroscedastic errors.
In the realm of practical applications, R-squared has been employed in various fields. One example is the Fama-French model, which is used to assess portfolio performance compared to market returns. Researchers have revisited this model and suggested considering heavy tail distributions for more accurate results. Another application is in the prediction of housing prices using satellite imagery, where incorporating satellite images into the model led to a significant improvement in R-squared scores. Lastly, R-squared has been utilized in building a prediction model for system testing defects, serving as an early quality indicator for software entering system testing.
In conclusion, R-squared is a valuable metric for evaluating the performance of regression models and has been the subject of ongoing research and practical applications. Its versatility and interpretability make it an essential tool for both machine learning experts and developers alike, helping them understand the relationships between variables and make informed decisions based on their models.

R-Squared
R-Squared Further Reading
1.A non-inferiority test for R-squared with random regressors http://arxiv.org/abs/2002.08476v2 Harlan Campbell2.Analysis of variance, coefficient of determination and $F$-test for local polynomial regression http://arxiv.org/abs/0810.4808v1 Li-Shan Huang, Jianwei Chen3.Generalized R-squared for Detecting Dependence http://arxiv.org/abs/1604.02736v3 Xufei Wang, Bo Jiang, Jun S. Liu4.Goal Clustering: VNS based heuristics http://arxiv.org/abs/1705.07666v4 Pedro Martins5.A New Look to Three-Factor Fama-French Regression Model using Sample Innovations http://arxiv.org/abs/2006.02467v1 Javad Shaabani, Ali Akbar Jafari6.House Price Prediction using Satellite Imagery http://arxiv.org/abs/2105.06060v1 Sina Jandaghi Semnani, Hoormazd Rezaei7.Hamiltonian Formulation of Bianchi Cosmological Models in Quadratic Theories of Gravity http://arxiv.org/abs/gr-qc/9510065v1 Jacques Demaret, Laurent Querella8.Finite temperature R-squared quantum gravity http://arxiv.org/abs/1302.1880v1 C. D. Burton9.A Prediction Model for System Testing Defects using Regression Analysis http://arxiv.org/abs/1401.5830v1 Muhammad Dhiauddin Mohamed Suffian, Suhaimi Ibrahim10.Novel Mining of Cancer via Mutation in Tumor Protein P53 using Quick Propagation Network http://arxiv.org/abs/1505.06751v1 Ayad Ghany Ismaeel, Raghad Zuhair YousifR-Squared Frequently Asked Questions
What does the R-squared value tell you?
R-squared, or the coefficient of determination, is a statistical measure that indicates how well the independent variables in a regression model explain the variance in the dependent variable. It ranges from 0 to 1, with higher values indicating a better fit of the model to the data. In other words, R-squared tells you the proportion of the total variation in the dependent variable that can be accounted for by the independent variables in the model.
What is a good R-squared?
A good R-squared value depends on the context and the specific problem being addressed. Generally, a higher R-squared value indicates a better fit of the model to the data. However, it is important to note that a high R-squared value does not necessarily imply that the model is accurate or reliable. It is essential to consider other factors, such as the complexity of the model, the number of independent variables, and the quality of the data, when evaluating the performance of a regression model.
What does an R-squared value of 0.5 mean?
An R-squared value of 0.5 means that 50% of the variance in the dependent variable can be explained by the independent variables in the regression model. In other words, half of the total variation in the dependent variable is accounted for by the model, while the other half remains unexplained. This value can be considered moderate, but it is essential to evaluate the model's performance in the context of the specific problem and the quality of the data.
What does R-squared stand for?
R-squared stands for the coefficient of determination. It is a statistical measure used to evaluate the performance of regression models by quantifying the proportion of the variance in the dependent variable that can be explained by the independent variables in the model.
How is R-squared calculated?
R-squared is calculated using the following formula: R-squared = 1 - (Sum of Squared Residuals / Total Sum of Squares) The Sum of Squared Residuals (SSR) represents the sum of the squared differences between the observed values and the predicted values of the dependent variable. The Total Sum of Squares (TSS) is the sum of the squared differences between the observed values and the mean of the dependent variable. By dividing SSR by TSS and subtracting the result from 1, we obtain the R-squared value.
Can R-squared be negative?
In theory, R-squared should not be negative, as it represents the proportion of the variance in the dependent variable explained by the independent variables. However, in some cases, R-squared can be negative when the model performs worse than a simple mean model. This situation is rare and usually indicates that the chosen model is not suitable for the data or that there are issues with the data itself.
How does R-squared relate to correlation?
R-squared is the square of the correlation coefficient (r) between the observed and predicted values of the dependent variable. The correlation coefficient measures the strength and direction of the linear relationship between two variables, while R-squared quantifies the proportion of the variance in the dependent variable that can be explained by the independent variables in the model. In other words, R-squared is a measure of the goodness of fit of the regression model, while correlation is a measure of the linear association between variables.
Is a higher R-squared always better?
A higher R-squared value generally indicates a better fit of the model to the data. However, a high R-squared value does not necessarily imply that the model is accurate or reliable. It is essential to consider other factors, such as the complexity of the model, the number of independent variables, and the quality of the data, when evaluating the performance of a regression model. Additionally, it is important to be cautious of overfitting, which occurs when a model becomes too complex and captures the noise in the data rather than the underlying pattern. Overfitting can lead to poor generalization and performance on new, unseen data.
Explore More Machine Learning Terms & Concepts