Skip to content

📐 02 — Linear Regression

Linear regression models a straight-line relationship between features and target.

y = b0 + b1*x1 + b2*x2 + ... + error

Example

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score

data = load_diabetes(as_frame=True)
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(mean_absolute_error(y_test, y_pred))
print(r2_score(y_test, y_pred))

Interpreting Coefficients

coef = pd.Series(model.coef_, index=X.columns)
print(coef.sort_values())

A coefficient shows how prediction changes when that feature increases by one unit, assuming other features stay constant.


Assumptions

  • relationship is roughly linear
  • errors are independent
  • no extreme multicollinearity
  • residuals behave reasonably

Linear regression is still useful even when assumptions are imperfect, but interpretation becomes weaker.


Common Mistakes

  • treating correlation as causation
  • ignoring outliers
  • interpreting coefficients after heavy scaling without care
  • expecting linear regression to capture complex nonlinear patterns

Next

➡️ 03-ridge-lasso-elasticnet