Transfer Learning¶

Modern deep learning works with very deep neural networks, and very very large training sets that take a lot of computation to train. A common training set is ImageNet, 14M hand-annotated images tagged with more than 20k categories.

For all these investments, we get general purpose computer vision systems.

We can leverage this generic training for specific applications, by removing the last few layers and replacing it with your own sequence of dense layers, and train just this head.

Transfer Learning in Keras¶

In Keras, we have a number of pre-trained networks available in tf.keras.applications, all of them pre-trained on ImageNet:

DenseNet (2016) - interleave convolutional/maxpooling layers with novel densenet layers: sequences of convolutional layers where each new layer takes ALL preceding layer outputs in the block as input. Comes in versions 121, 169, 201, number counts total # of (non-pooling) layers.
EfficientNet (2019) - Scales up depth (#layers), width (#features) and resolution (size of image) simultaneously. Baseline B0 has 7 blocks of mobile inverted bottleneck convolutional layers, and B1 throguh B7 scale this design up. An updated V2 architecture was published 2021.
Inception (2014) - Inception was created 2014, developed by Google, and has gone through 2 additional updates. The core idea is of using an inception module, in which stacks of convolutional layers run in parallel and are combined at the end.
ResNet (2015) - Introduce skipping connections: data is put through stacks of convolutional layers and taken directly to a combining convolutional layer, repeatedly. Available with 101 or 152 total # layers.
Inception-ResNet (2016) - This architecture combines the skipping connections for ResNet with the inception modules.
MobileNet (2017, 2018) - Introduces depth-wise separable convolutional layers to get deep networks small and fast enough to use on mobile phone platforms. v2 in 2018 introduces ResNet-style skipping layers, but is careful to skip between bottlenecks.
NASNet (2017) - Uses a controller neural network to pick what operations to do in what order.
VGG (2015) - Short stacks of convolutional layers are interleaved with pooling layers, for a total of 16 or 19 convolutional layers. These short stacks simulate larger convolutional kernels but use fewer parameters.
Xception (2017) - Stacks of depth-wise separable layers (as in MobileNet), with some ResNet-style skipping layers.

Transfer Learning in Keras¶

All these pre-trained networks have the same usage pattern. I will show here how to use VGG-16:

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.applications import vgg16

base_model = vgg16.VGG16(
    include_top=False,  # put in my own classifier layers
    weights='imagenet', # use pre-trained weights
    pooling='avg',      # flatten with global average pooling
    input_shape=[*IMAGE_SIZE, 3] # use our input images
)
base_model.trainable = True
with strategy.scope():
    inputs = keras.Input(shape=[*IMAGE_SIZE, 3])
    x = vgg16.preprocess_input(inputs)
    x = base_model(x, training=False)
    outputs = layers.Dense(len(CLASSES), activation="softmax")(x)
    model = keras.Model(inputs=inputs, outputs=outputs)