Machines are learning from others’ experiences. What about you?

3 min readMar 23, 2021

It is difficult for us to learn from other people’s experience. But, What about computers? TensorFlow makes it easy is for everybody.

As I stated above, it would be wonderful for us, human beings, to learn other people’s experiences. However it is not that much easy for us ( at least for me :) ). On the other hand in DL it is a very easy and a “must have tool” in analyst’s backpack. By transfer learning our model’s accuracy can go thru the roof. The reasons are:

The transferred model learned very important features:
— on a very huge dataset,
— spending too much training time,
— using too much computational power,

some of which might not be available for us. I like the analogy of wearing sunglasses to see normally you cannot see with bare eyes. That is exactly what transfer learning does. It makes the images visible for your model.

So why not use an “all — ready” trained model.

How does it work?

As you might know, in a very basic DL model, there will be some convolutional layers which are used to extract features from the image dataset. And some dense layers to make the predictions using those features.
What does feature mean in that context? The features are the curves, corners, lines that are brought forward by convolutional layers. So in a huge model with lots of data, the model extracts lots of features to maximise the accuracy. By transfer learning we take off that models top layers which are dedicated to predict that specific problem and replace it with ours.

Here, as an example I will use beans dataset which is available in TensorFlow datasets. To keep it shorter and understandable, I will present the whole code at once. In a real life problem the main difference will be the data preparation. Other than that you can use the rest with small adjustments. (for example number of classes in your problem should be changed in the very last layer.)

# required libraries
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
import tensorflow_hub as hub
import numpy as np
# load dataset
datasets, info = tfds.load(name='beans', with_info=True, as_supervised=True)
beans_train, beans_test, beans_validation = datasets['train'], datasets['test'],datasets['validation'] 
# function for preparing data - normalize and resize
def format_sample(image, label):
  image = tf.cast(image, tf.float32)
  image = image/255.
  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
  return image, label

IMG_SIZE = 224 
# prerare data by applying above function
train = beans_train.map(format_sample)
validation = beans_validation.map(format_sample)
test = beans_test.map(format_sample)

BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 1000
# prepare dataset: shuffle and take a batch of data from dataset
train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
validation_batches = validation.batch(BATCH_SIZE)
test_batches = test.batch(BATCH_SIZE)
# download the pretrained net
feature_extractor_url = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/2" 

feature_extractor_layer = hub.KerasLayer(feature_extractor_url,
                                         input_shape=(IMG_SIZE,IMG_SIZE,3))

feature_extractor_layer.trainable = False
# add layer to adjust our problem
model = keras.Sequential([
  feature_extractor_layer,
  keras.layers.Dense(128,activation="relu"),
  keras.layers.Dense(3) # 3 --> number of classes
])
# compile
model.compile(
  optimizer=keras.optimizers.Adam(),
  loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
  metrics=['acc'])
# train
history = model.fit(train_batches, epochs=10, validation_data = validation_batches)

Let’s take a look at the model summary:

As you see above, with almost fifty lines of codes we can adopt a pre-trained deep neural network to the problem in hand and get very good results. Instead of training 2,422,339 parameters, we just train our model(‘s last layers) for just 164,355 parameters. That is almost 7% of all parameters. And believe me the remaining 93% are really well trained parameters.
What we are doing in background is just transferring the pre-trained models experience to our problem and putting on top of it a classification layer.

Epoch 10/10
33/33 [==============================] — 20s 608ms/step — loss: 0.0219 — acc: 0.9981 — val_loss: 0.2024 — val_acc: 0.9098

With using the above model, you reach an accuracy of 0.9098 which is quite good. So I would definitely recommend using a pre-trained model in some phase of your analysis.

I would recommend to watch out for below parameters:
What is the size of each sample in your dataset? What is the input shape of feature extractor layer you transferred? What is the format of your labels? one hot or categorical?

Good luck and try to learn other peoples experiences!

Machines are learning from others’ experiences. What about you?

How does it work?

Written by ismail aslan