🌳 04 — Tree-Based Regression¶

Tree models split data into regions and predict values from similar examples.

Decision Tree Regressor¶

from sklearn.tree import DecisionTreeRegressor

tree = DecisionTreeRegressor(max_depth=4, random_state=42)
tree.fit(X_train, y_train)

Pros:

captures nonlinear patterns
no scaling required
easy to visualize conceptually

Cons:

can overfit easily
unstable with small data changes

Random Forest Regressor¶

from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(
    n_estimators=200,
    max_depth=None,
    random_state=42
)
rf.fit(X_train, y_train)

Random forests average many trees, reducing overfitting.

Gradient Boosting¶

from sklearn.ensemble import GradientBoostingRegressor

gb = GradientBoostingRegressor(random_state=42)
gb.fit(X_train, y_train)

Boosting builds trees sequentially, each one correcting previous errors.

Feature Importance¶

import pandas as pd

importance = pd.Series(rf.feature_importances_, index=X.columns)
print(importance.sort_values(ascending=False))

Feature importance is useful, but not always causal.

Next¶

➡️ 05-regression-metrics