Diagnostic plots¶

Many algorithms come with inherent assumptions: linear regression expects a linear relationship and normally distributed noise where the variance of the noise does not vary across the domain.

In classic linear regression, we use residual plots to evaluate these assumptions: a QQ-plot for residuals to check for normality, and a residuals plot (scatterplot of residuals vs. predictions) to check for linearity and homoscedasticity (non-varying variance).

On Kaggle¶

Kaggle provides the library YellowBrick to produce diagnostic plots.

from yellowbrick import regressor

model = pipeline.make_pipeline([
  preprocessing.StandardScaler(),
  linear_model.Lasso()
  ])
visualizer = regressor.ResidualsPlot(model)

visualizer.fit(Xtrain, ytrain)
visualizer.score(Xval, yval)
visualizer.show()

Or you can fit the model separately and use a quick-command to visualize

model.fit(Xtrain, ytrain)
regressor.residuals.residuals_plot(model, 
  Xtrain, ytrain, X_test=Xval, y_test=yval, 
  is_fitted=True)