Adjusted R-squared is a statistical measure used to assess the goodness of fit of a regression model, accounting for the number of predictors used.
In the context of machine learning, regression analysis is a technique used to model the relationship between a dependent variable and one or more independent variables. Adjusted R-squared is a modification of the R-squared metric, which measures the proportion of the variance in the dependent variable that can be explained by the independent variables. The adjusted R-squared takes into account the number of predictors in the model, penalizing models with a large number of predictors to avoid overfitting.
Recent research on adjusted R-squared has explored various aspects and applications of the metric. For example, one study focused on building a prediction model for system testing defects using regression analysis, selecting a model with an adjusted R-squared value greater than 90% as the desired prediction model. Another study investigated the minimum coverage probability of confidence intervals in regression after variable selection, providing an upper bound for the adjusted R-squared metric.
In practical applications, adjusted R-squared can be used to evaluate the performance of machine learning models in various domains. For instance, in real estate price prediction, researchers have used generalized additive models (GAM) with adjusted R-squared to assess the significance of environmental factors in urban centers. In another example, a study on the impact of population mobility on COVID-19 growth rate used adjusted R-squared to accurately estimate the growth rate of COVID-19 deaths as a function of population mobility.
One company case study involves the use of adjusted R-squared in the analysis of capital asset pricing models in the Chinese stock market. By selecting models with high adjusted R-squared values, the study demonstrated the applicability of capital asset pricing models in the Chinese market and provided a set of open-source materials for learning about these models.
In conclusion, adjusted R-squared is a valuable metric for evaluating the performance of regression models in machine learning, taking into account the number of predictors used. Its applications span various domains, from real estate price prediction to epidemiological studies, and it can be a useful tool for both researchers and practitioners in the field.

Adjusted R-Squared
Adjusted R-Squared Further Reading
1.A Prediction Model for System Testing Defects using Regression Analysis http://arxiv.org/abs/1401.5830v1 Muhammad Dhiauddin Mohamed Suffian, Suhaimi Ibrahim2.Upper bounds on the minimum coverage probability of confidence intervals in regression after variable selection http://arxiv.org/abs/0711.0993v1 Paul Kabaila, Khageswor Giri3.Bounds for Bias-Adjusted Treatment Effect in Linear Econometric Models http://arxiv.org/abs/2203.12431v1 Deepankar Basu4.Hedonic Models of Real Estate Prices: GAM and Environmental Factors http://arxiv.org/abs/2210.14266v1 Jason R. Bailey, Davide Lauria, W. Brent Lindquist, Stefan Mittnik, Svetlozar T. Rachev5.Evaluating the Data Quality of Eye Tracking Signals from a Virtual Reality System: Case Study using SMI's Eye-Tracking HTC Vive http://arxiv.org/abs/1912.02083v1 Dillon J. Lohr, Lee Friedman, Oleg V. Komogortsev6.An Empirical Study of Capital Asset Pricing Model based on Chinese A-share Trading Data http://arxiv.org/abs/2305.04838v1 Kai Ren7.Quantitative Relationship between Population Mobility and COVID-19 Growth Rate based on 14 Countries http://arxiv.org/abs/2006.02459v1 Benjamin Seibold, Zivjena Vucetic, Slobodan Vucetic8.A non-inferiority test for R-squared with random regressors http://arxiv.org/abs/2002.08476v2 Harlan Campbell9.Analysis of variance, coefficient of determination and $F$-test for local polynomial regression http://arxiv.org/abs/0810.4808v1 Li-Shan Huang, Jianwei Chen10.Generalized R-squared for Detecting Dependence http://arxiv.org/abs/1604.02736v3 Xufei Wang, Bo Jiang, Jun S. LiuAdjusted R-Squared Frequently Asked Questions
What is the difference between R-squared and adjusted R-squared?
R-squared is a statistical measure that represents the proportion of the variance in the dependent variable that can be explained by the independent variables in a regression model. It ranges from 0 to 1, with higher values indicating a better fit. However, R-squared has a limitation: it tends to increase as more predictors are added to the model, even if those predictors do not contribute significantly to the model's performance. Adjusted R-squared, on the other hand, is a modification of R-squared that takes into account the number of predictors in the model. It penalizes models with a large number of predictors to avoid overfitting. Adjusted R-squared is generally considered a more reliable metric for model evaluation, as it provides a more accurate representation of the model's performance when multiple predictors are used.
How do you interpret adjusted R-squared in regression?
Adjusted R-squared is interpreted as the proportion of the variance in the dependent variable that can be explained by the independent variables in the model, after accounting for the number of predictors. It ranges from 0 to 1, with higher values indicating a better fit. An adjusted R-squared value close to 1 suggests that the model explains a large portion of the variance in the dependent variable, while a value close to 0 indicates that the model does not explain much of the variance. When comparing different regression models, a higher adjusted R-squared value generally indicates a better model, as it suggests that the model is capturing more of the underlying relationships between the variables while avoiding overfitting.
Should I use R-squared or adjusted R-squared?
In most cases, it is recommended to use adjusted R-squared instead of R-squared when evaluating the performance of a regression model. This is because adjusted R-squared takes into account the number of predictors in the model and penalizes models with a large number of predictors, helping to avoid overfitting. R-squared, on the other hand, tends to increase as more predictors are added to the model, even if those predictors do not contribute significantly to the model's performance. Using adjusted R-squared can provide a more accurate representation of the model's performance, especially when multiple predictors are used.
What does it mean when adjusted R-squared is high?
A high adjusted R-squared value indicates that the regression model explains a large portion of the variance in the dependent variable, after accounting for the number of predictors used. This suggests that the model is capturing the underlying relationships between the variables effectively and is likely to be a good fit for the data. However, it is important to note that a high adjusted R-squared value does not guarantee that the model is perfect or that it will perform well on new, unseen data. It is always essential to validate the model using other evaluation metrics and techniques, such as cross-validation, to ensure its robustness and generalizability.
How is adjusted R-squared calculated?
Adjusted R-squared is calculated using the following formula: Adjusted R-squared = 1 - [(1 - R-squared) * (n - 1) / (n - k - 1)] where n is the number of observations in the dataset, k is the number of predictors in the model, and R-squared is the unadjusted R-squared value. The formula adjusts the R-squared value by taking into account the number of predictors used in the model, penalizing models with a large number of predictors to avoid overfitting.
Can adjusted R-squared be negative?
Yes, adjusted R-squared can be negative, although it is relatively rare. A negative adjusted R-squared value indicates that the model performs worse than a simple mean model, which predicts the mean of the dependent variable for all observations. This can happen when the model is overfitting the data or when the predictors used in the model do not have a significant relationship with the dependent variable. In practice, a negative adjusted R-squared value is a strong indication that the model should be re-evaluated and potentially improved by using different predictors, removing irrelevant predictors, or applying regularization techniques.
Explore More Machine Learning Terms & Concepts