Many algorithms come with inherent assumptions: linear regression expects a linear relationship and normally distributed noise where the variance of the noise does not vary across the domain.
In classic linear regression, we use residual plots to evaluate these assumptions: a QQ-plot for residuals to check for normality, and a residuals plot (scatterplot of residuals vs. predictions) to check for linearity and homoscedasticity (non-varying variance).
Kaggle provides the library YellowBrick to produce diagnostic plots.
from yellowbrick import regressor
model = pipeline.make_pipeline([
preprocessing.StandardScaler(),
linear_model.Lasso()
])
visualizer = regressor.ResidualsPlot(model)
visualizer.fit(Xtrain, ytrain)
visualizer.score(Xval, yval)
visualizer.show()
Or you can fit the model separately and use a quick-command to visualize
model.fit(Xtrain, ytrain)
regressor.residuals.residuals_plot(model,
Xtrain, ytrain, X_test=Xval, y_test=yval,
is_fitted=True)