Modern deep learning works with very deep neural networks, and very very large training sets that take a lot of computation to train. A common training set is ImageNet, 14M hand-annotated images tagged with more than 20k categories.
For all these investments, we get general purpose computer vision systems.
We can leverage this generic training for specific applications, by removing the last few layers and replacing it with your own sequence of dense layers, and train just this head.
In Keras, we have a number of pre-trained networks available in tf.keras.applications
, all of them pre-trained on ImageNet:
B0
has 7 blocks of mobile inverted bottleneck convolutional layers, and B1
throguh B7
scale this design up. An updated V2
architecture was published 2021.All these pre-trained networks have the same usage pattern. I will show here how to use VGG-16:
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.applications import vgg16
base_model = vgg16.VGG16(
include_top=False, # put in my own classifier layers
weights='imagenet', # use pre-trained weights
pooling='avg', # flatten with global average pooling
input_shape=[*IMAGE_SIZE, 3] # use our input images
)
base_model.trainable = True
with strategy.scope():
inputs = keras.Input(shape=[*IMAGE_SIZE, 3])
x = vgg16.preprocess_input(inputs)
x = base_model(x, training=False)
outputs = layers.Dense(len(CLASSES), activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)