Hyperparameter tuning¶

Models have parameters

linear regression coefficients
locations of cluster centers
...

The process of training a model has hyperparameters

regularization balance (for Lasso / Ridge Regression)
$k$ (for $k$-nearest neighbors)
dimension (for PCA dimensionality reduction)
...

Hyperparameter Tuning¶

model1 = neighbors.KNeighborsClassifier(3)
model2 = neighbors.KNeighborsClassifier(4)
model3 = neighbors.KNeighborsClassifier(5)
model4 = neighbors.KNeighborsClassifier(10)
model5 = neighbors.KNeighborsClassifier(50)

scores1 = model_selection.cross_val_score(model1, X, y)
scores2 = model_selection.cross_val_score(model2, X, y)
scores3 = model_selection.cross_val_score(model3, X, y)
scores4 = model_selection.cross_val_score(model4, X, y)
scores5 = model_selection.cross_val_score(model5, X, y)

Hyperparameter Tuning¶

ks = [3,4,5,10,50]
models = [neighbors.KNeighborsClassifier(k) for k in ks]
scores = [model_selection.cross_val_score(model,X,y) 
          for model in models]

Hyperparameter Tuning¶

ks = [3,4,5,10,50]
weights = ["uniform", "distance"]
models = [neighbors.KNeighborsClassifier(k, weights=w) 
          for k in ks for w in weights]
scores = [model_selection.cross_val_score(model,X,y) 
          for model in models]

This is called grid search - to systematically go through all combinations (imagine a square grid...) to find the best option.

Hyperparameter Tuning¶

parameters = {
    'kneighborsclassifier__n_neighbors': [3,4,5,10,50],
    'kneighborsclassifier__weights': ["uniform", "distance"]  
}
model = pipeline.make_pipeline(preprocessing.StandardScaler(),
   neighbors.KNeighborsClassifier())
gscv = model_selection.GridSearchCV(model, parameters, cv=5)
gscv.fit(X, y)

You can also give a list of parameter dictionaries to GridSearchCV, if the availability of some parameters depends on the value of other parameters.

Hyperparameter tuning¶

A GridSearchCV object has important members:

Member	Contains
`.predict()`	calls `predict` on the best estimator
`.transform()`	calls `transform` on the best estimator
`.best_estimator_`	the best estimator
`.best_score_`	the score of the best estimator
`.best_params_`	the parameters for the best estimator
`.cv_results_`	detailed results - timings, parameter combinations, scores - for all CV runs

Example: MNIST Digits¶

In [3]:

fig

Out[3]:

Example: MNIST Digits¶

In [4]:

parameters = {
    'kneighborsclassifier__n_neighbors': [3,4,5,10,50],
    'kneighborsclassifier__weights': ["uniform", "distance"]
}
model = pipeline.make_pipeline(preprocessing.StandardScaler(), 
                               neighbors.KNeighborsClassifier())
gscv = model_selection.GridSearchCV(model, parameters, cv=5)
gscv.fit(X_train, y_train)

Out[4]:

GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('standardscaler', StandardScaler()),
                                       ('kneighborsclassifier',
                                        KNeighborsClassifier())]),
             param_grid={'kneighborsclassifier__n_neighbors': [3, 4, 5, 10, 50],
                         'kneighborsclassifier__weights': ['uniform',
                                                           'distance']})

Example: MNIST Digits¶

In [6]:

print(gscv.score(X_test, y_test))
fig

0.9699666295884316

Out[6]:

Example: MNIST Digits¶

In [7]:

gscv.best_params_

Out[7]:

{'kneighborsclassifier__n_neighbors': 4,
 'kneighborsclassifier__weights': 'distance'}

How to find those names?¶

model.get_params() - lists all parameters known to a model

pipeline.make_pipeline:

This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically.

Explicitly constructing a pipeline: allows you to set names yourself

pipeline = pipeline.Pipeline(
  ("pca", decomposition.PCA()),
  ("lasso", linear_model.Lasso()))

Parameters in a pipeline are named [componentname]__[parametername].

How to find those names?¶

model.get_params() - lists all parameters known to a model

compose.make_column_transformer:

This is a shorthand for the ColumnTransformer constructor; it does not require, and does not permit, naming the transformers. Instead, they will be given names automatically based on their types.

Explicitly constructing a pipeline: allows you to set names yourself

transformer = compose.ColumnTransformer(
  ("numeric", numeric_cleanup, numeric_columns),
  ("categorical", categorical_cleanup, categorical_columns))

Parameters in a column transformer are named [componentname]__[parametername].

How to find those names?¶

Example using the titanic cleanup chain we have used repeatedly:

>>> cleanup.get_params()

[..........]

 'pipeline-1__simpleimputer__strategy': 'median',
 'pipeline-2__simpleimputer__strategy': 'constant',
 'pipeline-2__simpleimputer__fill_value': 'NA',
 'pipeline-2__onehotencoder__handle_unknown': 'ignore',

Classifiers¶

Logistic Regression
$k$-Nearest Neighbors
Decision Tree
Random Forest
AdaBoost
Gradient Boosting

Binary Classifiers¶

Logistic Regression

Multi-Class Classifiers¶

$k$-Nearest Neighbors
Decision Tree
Random Forest
AdaBoost
Gradient Boosting

Binary Classifier: pick $A$ or $B$.

Multi-Class Classifier: pick one of $A_1,\dots,A_k$.

For our current competition, we have 10 classes. Binary classifier will not work.

One option: use a multi-class probability model in Logistic Regression. (set multi_class="multinomial")

Another option: Ensemble Classifiers.

Ensemble Classifier¶

One-vs-Rest: Train one separate classifier to recognize each class. Use whichever classifier is most confident.
One-vs-One: Train a classifier for each pair of classes. Let the classifiers vote.

Ensemble Classifier¶

	One-vs-Rest	One-vs-One
Dataset Size	Full dataset	Subset
Training Rounds	$O(k)$	$O(k^2)$

For methods that scale badly with dataset sizes, One-vs-One may outperform even though many more models are trained.

Multi-Class Classifiers¶

In sklearn:

Inherently Multi-Class¶

linear_model.LogisticRegression(multi_class="multinomial")
neighbors.KNeighborsClassifier
tree.DecisionTreeClassifier
ensemble.RandomForestClassifier
ensemble.AdaBoostClassifier
ensemble.GradientBoostingClassifier

One-vs-All¶

linear_model.LogisticRegression(multi_class="ovr")
multiclass.OneVsRestClassifier - meta classifier. Takes a classifier object.

One-vs-One¶

Several Support Vector Machine systems (to be discussed later)

multiclass.OneVsOneClassifier - meta classifier.