Hyperparameter tuning

Models have parameters

  • linear regression coefficients
  • locations of cluster centers
  • ...

The process of training a model has hyperparameters

  • regularization balance (for Lasso / Ridge Regression)
  • $k$ (for $k$-nearest neighbors)
  • dimension (for PCA dimensionality reduction)
  • ...

Hyperparameter Tuning

model1 = neighbors.KNeighborsClassifier(3)
model2 = neighbors.KNeighborsClassifier(4)
model3 = neighbors.KNeighborsClassifier(5)
model4 = neighbors.KNeighborsClassifier(10)
model5 = neighbors.KNeighborsClassifier(50)

scores1 = model_selection.cross_val_score(model1, X, y)
scores2 = model_selection.cross_val_score(model2, X, y)
scores3 = model_selection.cross_val_score(model3, X, y)
scores4 = model_selection.cross_val_score(model4, X, y)
scores5 = model_selection.cross_val_score(model5, X, y)

Hyperparameter Tuning

ks = [3,4,5,10,50]
models = [neighbors.KNeighborsClassifier(k) for k in ks]
scores = [model_selection.cross_val_score(model,X,y) 
          for model in models]

Hyperparameter Tuning

ks = [3,4,5,10,50]
weights = ["uniform", "distance"]
models = [neighbors.KNeighborsClassifier(k, weights=w) 
          for k in ks for w in weights]
scores = [model_selection.cross_val_score(model,X,y) 
          for model in models]

This is called grid search - to systematically go through all combinations (imagine a square grid...) to find the best option.

Hyperparameter Tuning

parameters = {
    'kneighborsclassifier__n_neighbors': [3,4,5,10,50],
    'kneighborsclassifier__weights': ["uniform", "distance"]  
}
model = pipeline.make_pipeline(preprocessing.StandardScaler(),
   neighbors.KNeighborsClassifier())
gscv = model_selection.GridSearchCV(model, parameters, cv=5)
gscv.fit(X, y)

You can also give a list of parameter dictionaries to GridSearchCV, if the availability of some parameters depends on the value of other parameters.

Hyperparameter tuning

A GridSearchCV object has important members:

Member Contains
.predict() calls predict on the best estimator
.transform() calls transform on the best estimator
.best_estimator_ the best estimator
.best_score_ the score of the best estimator
.best_params_ the parameters for the best estimator
.cv_results_ detailed results - timings, parameter combinations, scores - for all CV runs

Example: MNIST Digits

In [3]:
fig
Out[3]:

Example: MNIST Digits

In [4]:
parameters = {
    'kneighborsclassifier__n_neighbors': [3,4,5,10,50],
    'kneighborsclassifier__weights': ["uniform", "distance"]
}
model = pipeline.make_pipeline(preprocessing.StandardScaler(), 
                               neighbors.KNeighborsClassifier())
gscv = model_selection.GridSearchCV(model, parameters, cv=5)
gscv.fit(X_train, y_train)
Out[4]:
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('standardscaler', StandardScaler()),
                                       ('kneighborsclassifier',
                                        KNeighborsClassifier())]),
             param_grid={'kneighborsclassifier__n_neighbors': [3, 4, 5, 10, 50],
                         'kneighborsclassifier__weights': ['uniform',
                                                           'distance']})

Example: MNIST Digits

In [6]:
print(gscv.score(X_test, y_test))
fig
0.9699666295884316
Out[6]:

Example: MNIST Digits

In [7]:
gscv.best_params_
Out[7]:
{'kneighborsclassifier__n_neighbors': 4,
 'kneighborsclassifier__weights': 'distance'}

How to find those names?

model.get_params() - lists all parameters known to a model

pipeline.make_pipeline:

This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically.

Explicitly constructing a pipeline: allows you to set names yourself

pipeline = pipeline.Pipeline(
  ("pca", decomposition.PCA()),
  ("lasso", linear_model.Lasso()))

Parameters in a pipeline are named [componentname]__[parametername].

How to find those names?

model.get_params() - lists all parameters known to a model

compose.make_column_transformer:

This is a shorthand for the ColumnTransformer constructor; it does not require, and does not permit, naming the transformers. Instead, they will be given names automatically based on their types.

Explicitly constructing a pipeline: allows you to set names yourself

transformer = compose.ColumnTransformer(
  ("numeric", numeric_cleanup, numeric_columns),
  ("categorical", categorical_cleanup, categorical_columns))

Parameters in a column transformer are named [componentname]__[parametername].

How to find those names?

Example using the titanic cleanup chain we have used repeatedly:

>>> cleanup.get_params()

[..........]

 'pipeline-1__simpleimputer__strategy': 'median',
 'pipeline-2__simpleimputer__strategy': 'constant',
 'pipeline-2__simpleimputer__fill_value': 'NA',
 'pipeline-2__onehotencoder__handle_unknown': 'ignore',

Classifiers

  • Logistic Regression
  • $k$-Nearest Neighbors
  • Decision Tree
  • Random Forest
  • AdaBoost
  • Gradient Boosting

Binary Classifiers

  • Logistic Regression

Multi-Class Classifiers

  • $k$-Nearest Neighbors
  • Decision Tree
  • Random Forest
  • AdaBoost
  • Gradient Boosting

Binary Classifier: pick $A$ or $B$.

Multi-Class Classifier: pick one of $A_1,\dots,A_k$.

For our current competition, we have 10 classes. Binary classifier will not work.

One option: use a multi-class probability model in Logistic Regression. (set multi_class="multinomial")

Another option: Ensemble Classifiers.

Ensemble Classifier

  • One-vs-Rest: Train one separate classifier to recognize each class. Use whichever classifier is most confident.
  • One-vs-One: Train a classifier for each pair of classes. Let the classifiers vote.

Ensemble Classifier

  One-vs-Rest One-vs-One
Dataset Size Full dataset Subset
Training Rounds $O(k)$ $O(k^2)$

For methods that scale badly with dataset sizes, One-vs-One may outperform even though many more models are trained.

Multi-Class Classifiers

In sklearn:

Inherently Multi-Class

  • linear_model.LogisticRegression(multi_class="multinomial")
  • neighbors.KNeighborsClassifier
  • tree.DecisionTreeClassifier
  • ensemble.RandomForestClassifier
  • ensemble.AdaBoostClassifier
  • ensemble.GradientBoostingClassifier

One-vs-All

  • linear_model.LogisticRegression(multi_class="ovr")
  • multiclass.OneVsRestClassifier - meta classifier. Takes a classifier object.

One-vs-One

Several Support Vector Machine systems (to be discussed later)

  • multiclass.OneVsOneClassifier - meta classifier.