🏷️ 03 — Categorical Features¶
Machine learning models need categories converted to numbers.
One-Hot Encoding¶
Good for nominal categories:
- city
- product type
- browser
Ordinal Encoding¶
Use when categories have real order.
Do not ordinal-encode unordered categories like city.
Rare Categories¶
High-cardinality categories can overfit.
top = df["city"].value_counts().head(10).index
df["city_clean"] = df["city"].where(df["city"].isin(top), "Other")
Target Encoding Warning¶
Target encoding can be powerful but dangerous because it can leak target information.
Use cross-validation-based target encoding if needed.