Examining Regression Techniques to Forecast Resale Cost of HDB Flats through SHAP Analysis
In the realm of real estate analytics, transparency and interpretability are crucial. A new method, SHAP (SHapley Additive exPlanations), is set to revolutionise the way data analysts and data scientists approach the interpretation of machine learning models. SHAP is expected to become a trending analytics or data science library starting from 2022, offering valuable insights into the inner workings of complex models.
### Global Analysis
SHAP provides a global view by ranking features by their mean absolute SHAP values, indicating their overall contribution to model predictions. For HDB resale flat prices, common factors such as flat size, distance to the CBD, and lease remaining are typically significant contributors. However, the ranking and magnitude of feature importance can vary based on the model's complexity.
For instance, linear models like Linear Regression offer a straightforward interpretation of each feature's global impact, while more complex models like Random Forest Regressor highlight features not prioritised by linear models, especially those involved in important splits or interactions.
### Visualization and Directionality
SHAP's beeswarm plots not only show feature importance but also the direction of the effect, whether higher values of a feature increase or decrease the predicted price. This is particularly useful for complex models like Random Forest, where the relationship between features and outcome isn't strictly linear.
### Local Analysis
Local SHAP analysis explains individual predictions by decomposing the output into contributions from each feature. This is especially valuable for understanding why a specific HDB flat was predicted to have a certain price, regardless of the underlying model.
For instance, the remaining years of lease of a resale flat gives a high negative contribution to the prediction of its selling price, while its flat size gives a highly positive contribution.
### Key Takeaways
- **Global SHAP** offers a ranking of feature importance and the overall direction of their effects, with the richness of interpretation increasing with model complexity. - **Local SHAP** provides detailed, individualised explanations, crucial for understanding specific predictions, especially in complex models like Random Forest where global summaries can obscure important local phenomena. - **Model choice matters**: While SHAP is model-agnostic, the depth and nuance of both global and local explanations are greatest when applied to capable, nonlinear models. Linear models offer transparent but limited insights, whereas tree-based models can reveal intricate patterns that global summaries might miss.
In summary, SHAP dramatically enhances both global and local interpretability of HDB resale price predictions—provided the underlying model has learned meaningful patterns from the data. It bridges the gap between high-performing, complex models and the need for understandable, actionable insights in real estate analytics.
Data preprocessing and data wrangling were conducted using HDB Resale Flat Prices from Data.gov.sg and OneMap API. The article predicts the selling price of HDB resale flats from January 1990 to January 2022 using Dummy Regressor, Linear Regression, and Random Forest Regressor. SHAP values were calculated for the Random Forest Regressor model. New MRT stations that were opened in end-August last year are included in the data preprocessing stage. The distance from the resale flat to the CBD has a negative impact on the prediction of the selling price, while a shorter distance has a positive impact.
Technology advances in data-and-cloud-computing have brought about the implementation of SHAP (SHapley Additive exPlanations) in the field of education-and-self-development, particularly in online-learning platforms. SHAP is a method that offers a trending analytics or data science library, providing valuable insights into the interpretability of machine learning models, a crucial aspect in both data-and-cloud-computing and various sectors, such as education-and-self-development and real estate analytics.