Skip to content

🌳 04 — Tree-Based Regression

Tree models split data into regions and predict values from similar examples.


Decision Tree Regressor

from sklearn.tree import DecisionTreeRegressor

tree = DecisionTreeRegressor(max_depth=4, random_state=42)
tree.fit(X_train, y_train)

Pros:

  • captures nonlinear patterns
  • no scaling required
  • easy to visualize conceptually

Cons:

  • can overfit easily
  • unstable with small data changes

Random Forest Regressor

from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(
    n_estimators=200,
    max_depth=None,
    random_state=42
)
rf.fit(X_train, y_train)

Random forests average many trees, reducing overfitting.


Gradient Boosting

from sklearn.ensemble import GradientBoostingRegressor

gb = GradientBoostingRegressor(random_state=42)
gb.fit(X_train, y_train)

Boosting builds trees sequentially, each one correcting previous errors.


Feature Importance

import pandas as pd

importance = pd.Series(rf.feature_importances_, index=X.columns)
print(importance.sort_values(ascending=False))

Feature importance is useful, but not always causal.


Next

➡️ 05-regression-metrics