🤖 04 — Modeling Pipeline¶
Baseline¶
from sklearn.dummy import DummyClassifier
baseline = DummyClassifier(strategy="most_frequent")
baseline.fit(X_train, y_train)
Real Models¶
Try:
- Logistic Regression
- Random Forest
- Gradient Boosting
Pipeline Template¶
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
preprocess = ColumnTransformer([
("num", StandardScaler(), numeric_features),
("cat", OneHotEncoder(handle_unknown="ignore"), categorical_features)
])
model = Pipeline([
("preprocess", preprocess),
("model", LogisticRegression(max_iter=1000))
])
model.fit(X_train, y_train)