Overview

This notebook will examine behaviors of a visual explanation methods of deep learning model. The model will train classifying to 6 classes (buildings, forest, glacier, mountain, sea, street) for each images using this datasets. The architecture of the model is a self prepared ResNet18_ . Visual explanation methods that will be examined are

- Grad-CAM https://arxiv.org/abs/1610.02391

Why this is Useful?

To help deep learning practitioners visually debug their models and properly understand where it’s “looking” in an image, Selvaraju et al. created Gradient-weighted Class Activation Mapping, or more simply, Grad-CAM
Grad-CAM uses the gradients of any target concept (say logits for “dog” or even a caption), flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.

Using Kaggle API to download dataset into Colab Environment

!pip install -q kaggle

from google.colab import files
files.upload()

Saving kaggle.json to kaggle.json

{'kaggle.json': b'{"username":"janmejaybhoi","key":"6cdf252d44b3db77a67e30fff01bf8a0"}'}

!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/

! chmod 600 ~/.kaggle/kaggle.json

!kaggle datasets download -d puneet6060/intel-image-classification

Downloading intel-image-classification.zip to /content
100% 345M/346M [00:02<00:00, 131MB/s]
100% 346M/346M [00:02<00:00, 131MB/s]

!unzip /content/intel-image-classification

Import necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import random
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.applications.inception_resnet_v2 import InceptionResNetV2
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.initializers import glorot_uniform
from tensorflow.keras.utils import plot_model
from tensorflow.keras import backend as K
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint, LearningRateScheduler
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from random import sample
from IPython.display import display
import os
import PIL

Read Data

train = {}
test = {}

path = "/content/intel-image-classification"

# Make dictionary storing images for each category under train data.
path_train = os.path.join(path, "seg_train/seg_train")
for i in os.listdir(path_train):
    train[i] = os.listdir(os.path.join(path_train, i))

# Make dictionary storing images for each category under test data.
path_test = os.path.join(path, "seg_test/seg_test")
for i in os.listdir(path_test):
    test[i] = os.listdir(os.path.join(path_test, i))

Explore data

len_train = np.concatenate(list(train.values())).shape[0]
len_test = np.concatenate(list(test.values())).shape[0]

print("Number of images in training data : {}".format(len_train))
print("Number of images in testing data : {}".format(len_test))

Number of images in training data : 14034
Number of images in testing data : 3000

# You will see different images each time.
fig, axs = plt.subplots(6, 5, figsize = (15, 15))
for i, item in enumerate(os.listdir(path_train)):
    images = sample(train[item], 5)
    
    for j, image in enumerate(images):
        img = PIL.Image.open(os.path.join(path_train, item, image))
        axs[i, j].imshow(img)
        axs[i, j].set(xlabel = item, xticks = [], yticks = [])

fig.tight_layout()

for item in train.keys():
    print(item, len(train[item]))

sea 2274
glacier 2404
buildings 2191
street 2382
forest 2271
mountain 2512

# This is often useful when you want your dataset to be balanced.
fig, ax = plt.subplots()
ax.pie(
    [len(train[item]) for item in train],
    labels = train.keys(),
    autopct = "%1.1f%%"
)
fig.show()

Augment data

# Here we go with zooming, flipping (horizontally and vertically), and rescaling.
train_datagen = ImageDataGenerator(
    zoom_range = 0.2,
    horizontal_flip = True,
    vertical_flip = True,
    rescale=1./255
)

# For test data we only rescale the data.
# Never augment test data!!!
test_datagen = ImageDataGenerator(rescale=1./255)

Create data generator

# This will make images (including augmented ones) start flowing from the directory to the model.
# Note that augmented images are not stored along with the original images. The process happens in memory.

# Train generator
train_generator = train_datagen.flow_from_directory(
    path_train,
    target_size=(256, 256),
    batch_size=32,
    class_mode='categorical'
)

# Test generator
test_generator = test_datagen.flow_from_directory(
    path_test,
    target_size=(256, 256),
    batch_size=32,
    class_mode='categorical'
)

Found 14034 images belonging to 6 classes.
Found 3000 images belonging to 6 classes.

Develop Neural Network Architecture

# You can use a different architecture if you like.
def res_block(X, filter, stage):
  
    # Convolutional_block
    X_copy = X

    f1 , f2, f3 = filter
    
    # Main Path
    X = Conv2D(f1, (1,1),strides = (1,1), name ='res_'+str(stage)+'_conv_a', kernel_initializer= glorot_uniform(seed = 0))(X)
    X = MaxPool2D((2,2))(X)
    X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_a')(X)
    X = Activation('relu')(X) 

    X = Conv2D(f2, kernel_size = (3,3), strides =(1,1), padding = 'same', name ='res_'+str(stage)+'_conv_b', kernel_initializer= glorot_uniform(seed = 0))(X)
    X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_b')(X)
    X = Activation('relu')(X) 

    X = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_conv_c', kernel_initializer= glorot_uniform(seed = 0))(X)
    X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_c')(X)


    # Short path
    X_copy = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_conv_copy', kernel_initializer= glorot_uniform(seed = 0))(X_copy)
    X_copy = MaxPool2D((2,2))(X_copy)
    X_copy = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_copy')(X_copy)

    # ADD
    X = Add()([X, X_copy])
    X = Activation('relu')(X)

    # Identity Block 1
    X_copy = X
    

    # Main Path
    X = Conv2D(f1, (1,1),strides = (1,1), name ='res_'+str(stage)+'_identity_1_a', kernel_initializer= glorot_uniform(seed = 0))(X)
    X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_1_a')(X)
    X = Activation('relu')(X) 

    X = Conv2D(f2, kernel_size = (3,3), strides =(1,1), padding = 'same', name ='res_'+str(stage)+'_identity_1_b', kernel_initializer= glorot_uniform(seed = 0))(X)
    X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_1_b')(X)
    X = Activation('relu')(X) 

    X = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_identity_1_c', kernel_initializer= glorot_uniform(seed = 0))(X)
    X = BatchNormalization(axis = 3, name = 'bn_'+str(stage)+'_identity_1_c')(X)

    # ADD
    X = Add()([X, X_copy])
    X = Activation('relu')(X)

    # Identity Block 2
    X_copy = X


    # Main Path
    X = Conv2D(f1, (1,1),strides = (1,1), name ='res_'+str(stage)+'_identity_2_a', kernel_initializer= glorot_uniform(seed = 0))(X)
    X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_2_a')(X)
    X = Activation('relu')(X) 

    X = Conv2D(f2, kernel_size = (3,3), strides =(1,1), padding = 'same', name ='res_'+str(stage)+'_identity_2_b', kernel_initializer= glorot_uniform(seed = 0))(X)
    X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_2_b')(X)
    X = Activation('relu')(X) 

    X = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_identity_2_c', kernel_initializer= glorot_uniform(seed = 0))(X)
    X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_2_c')(X)

    # ADD
    X = Add()([X, X_copy])
    X = Activation('relu')(X)

    return X

input_shape = (256,256,3)

# Input tensor shape
X_input = Input(input_shape)

# Zero-padding
X = ZeroPadding2D((3,3))(X_input)

# 1- stage
X = Conv2D(64, (7,7), strides= (2,2), name = 'conv1', kernel_initializer= glorot_uniform(seed = 0))(X)
X = BatchNormalization(axis =3, name = 'bn_conv1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3), strides= (2,2))(X)

# 2- stage
X = res_block(X, filter= [64,64,256], stage= 2)

# 3- stage
X = res_block(X, filter= [128,128,512], stage= 3)

# 4- stage
X = res_block(X, filter= [256,256,1024], stage= 4)

# 5- stage
X = res_block(X, filter= [512,512,2048], stage= 5)

# Average Pooling
X = AveragePooling2D((2,2), name = 'Averagea_Pooling')(X)

# Final layer
X = Flatten()(X)
X = Dropout(0.4)(X)
X = Dense(6, activation = 'softmax', name = 'Dense_final', kernel_initializer= glorot_uniform(seed=0))(X)

# Build model.
model = Model(
    inputs= X_input, 
    outputs = X, 
    name = 'Resnet18'
)

# Check out model summary.
model.summary()

Model: "Resnet18"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 256, 256, 3) 0                                            
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D)  (None, 262, 262, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 128, 128, 64) 9472        zero_padding2d[0][0]             
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 128, 128, 64) 256         conv1[0][0]                      
__________________________________________________________________________________________________
activation (Activation)         (None, 128, 128, 64) 0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, 63, 63, 64)   0           activation[0][0]                 
__________________________________________________________________________________________________
res_2_conv_a (Conv2D)           (None, 63, 63, 64)   4160        max_pooling2d[0][0]              
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 31, 31, 64)   0           res_2_conv_a[0][0]               
__________________________________________________________________________________________________
bn_2_conv_a (BatchNormalization (None, 31, 31, 64)   256         max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 31, 31, 64)   0           bn_2_conv_a[0][0]                
__________________________________________________________________________________________________
res_2_conv_b (Conv2D)           (None, 31, 31, 64)   36928       activation_1[0][0]               
__________________________________________________________________________________________________
bn_2_conv_b (BatchNormalization (None, 31, 31, 64)   256         res_2_conv_b[0][0]               
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 31, 31, 64)   0           bn_2_conv_b[0][0]                
__________________________________________________________________________________________________
res_2_conv_copy (Conv2D)        (None, 63, 63, 256)  16640       max_pooling2d[0][0]              
__________________________________________________________________________________________________
res_2_conv_c (Conv2D)           (None, 31, 31, 256)  16640       activation_2[0][0]               
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 31, 31, 256)  0           res_2_conv_copy[0][0]            
__________________________________________________________________________________________________
bn_2_conv_c (BatchNormalization (None, 31, 31, 256)  1024        res_2_conv_c[0][0]               
__________________________________________________________________________________________________
bn_2_conv_copy (BatchNormalizat (None, 31, 31, 256)  1024        max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
add (Add)                       (None, 31, 31, 256)  0           bn_2_conv_c[0][0]                
                                                                 bn_2_conv_copy[0][0]             
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 31, 31, 256)  0           add[0][0]                        
__________________________________________________________________________________________________
res_2_identity_1_a (Conv2D)     (None, 31, 31, 64)   16448       activation_3[0][0]               
__________________________________________________________________________________________________
bn_2_identity_1_a (BatchNormali (None, 31, 31, 64)   256         res_2_identity_1_a[0][0]         
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 31, 31, 64)   0           bn_2_identity_1_a[0][0]          
__________________________________________________________________________________________________
res_2_identity_1_b (Conv2D)     (None, 31, 31, 64)   36928       activation_4[0][0]               
__________________________________________________________________________________________________
bn_2_identity_1_b (BatchNormali (None, 31, 31, 64)   256         res_2_identity_1_b[0][0]         
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 31, 31, 64)   0           bn_2_identity_1_b[0][0]          
__________________________________________________________________________________________________
res_2_identity_1_c (Conv2D)     (None, 31, 31, 256)  16640       activation_5[0][0]               
__________________________________________________________________________________________________
bn_2_identity_1_c (BatchNormali (None, 31, 31, 256)  1024        res_2_identity_1_c[0][0]         
__________________________________________________________________________________________________
add_1 (Add)                     (None, 31, 31, 256)  0           bn_2_identity_1_c[0][0]          
                                                                 activation_3[0][0]               
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 31, 31, 256)  0           add_1[0][0]                      
__________________________________________________________________________________________________
res_2_identity_2_a (Conv2D)     (None, 31, 31, 64)   16448       activation_6[0][0]               
__________________________________________________________________________________________________
bn_2_identity_2_a (BatchNormali (None, 31, 31, 64)   256         res_2_identity_2_a[0][0]         
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 31, 31, 64)   0           bn_2_identity_2_a[0][0]          
__________________________________________________________________________________________________
res_2_identity_2_b (Conv2D)     (None, 31, 31, 64)   36928       activation_7[0][0]               
__________________________________________________________________________________________________
bn_2_identity_2_b (BatchNormali (None, 31, 31, 64)   256         res_2_identity_2_b[0][0]         
__________________________________________________________________________________________________
activation_8 (Activation)       (None, 31, 31, 64)   0           bn_2_identity_2_b[0][0]          
__________________________________________________________________________________________________
res_2_identity_2_c (Conv2D)     (None, 31, 31, 256)  16640       activation_8[0][0]               
__________________________________________________________________________________________________
bn_2_identity_2_c (BatchNormali (None, 31, 31, 256)  1024        res_2_identity_2_c[0][0]         
__________________________________________________________________________________________________
add_2 (Add)                     (None, 31, 31, 256)  0           bn_2_identity_2_c[0][0]          
                                                                 activation_6[0][0]               
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 31, 31, 256)  0           add_2[0][0]                      
__________________________________________________________________________________________________
res_3_conv_a (Conv2D)           (None, 31, 31, 128)  32896       activation_9[0][0]               
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 15, 15, 128)  0           res_3_conv_a[0][0]               
__________________________________________________________________________________________________
bn_3_conv_a (BatchNormalization (None, 15, 15, 128)  512         max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
activation_10 (Activation)      (None, 15, 15, 128)  0           bn_3_conv_a[0][0]                
__________________________________________________________________________________________________
res_3_conv_b (Conv2D)           (None, 15, 15, 128)  147584      activation_10[0][0]              
__________________________________________________________________________________________________
bn_3_conv_b (BatchNormalization (None, 15, 15, 128)  512         res_3_conv_b[0][0]               
__________________________________________________________________________________________________
activation_11 (Activation)      (None, 15, 15, 128)  0           bn_3_conv_b[0][0]                
__________________________________________________________________________________________________
res_3_conv_copy (Conv2D)        (None, 31, 31, 512)  131584      activation_9[0][0]               
__________________________________________________________________________________________________
res_3_conv_c (Conv2D)           (None, 15, 15, 512)  66048       activation_11[0][0]              
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 15, 15, 512)  0           res_3_conv_copy[0][0]            
__________________________________________________________________________________________________
bn_3_conv_c (BatchNormalization (None, 15, 15, 512)  2048        res_3_conv_c[0][0]               
__________________________________________________________________________________________________
bn_3_conv_copy (BatchNormalizat (None, 15, 15, 512)  2048        max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
add_3 (Add)                     (None, 15, 15, 512)  0           bn_3_conv_c[0][0]                
                                                                 bn_3_conv_copy[0][0]             
__________________________________________________________________________________________________
activation_12 (Activation)      (None, 15, 15, 512)  0           add_3[0][0]                      
__________________________________________________________________________________________________
res_3_identity_1_a (Conv2D)     (None, 15, 15, 128)  65664       activation_12[0][0]              
__________________________________________________________________________________________________
bn_3_identity_1_a (BatchNormali (None, 15, 15, 128)  512         res_3_identity_1_a[0][0]         
__________________________________________________________________________________________________
activation_13 (Activation)      (None, 15, 15, 128)  0           bn_3_identity_1_a[0][0]          
__________________________________________________________________________________________________
res_3_identity_1_b (Conv2D)     (None, 15, 15, 128)  147584      activation_13[0][0]              
__________________________________________________________________________________________________
bn_3_identity_1_b (BatchNormali (None, 15, 15, 128)  512         res_3_identity_1_b[0][0]         
__________________________________________________________________________________________________
activation_14 (Activation)      (None, 15, 15, 128)  0           bn_3_identity_1_b[0][0]          
__________________________________________________________________________________________________
res_3_identity_1_c (Conv2D)     (None, 15, 15, 512)  66048       activation_14[0][0]              
__________________________________________________________________________________________________
bn_3_identity_1_c (BatchNormali (None, 15, 15, 512)  2048        res_3_identity_1_c[0][0]         
__________________________________________________________________________________________________
add_4 (Add)                     (None, 15, 15, 512)  0           bn_3_identity_1_c[0][0]          
                                                                 activation_12[0][0]              
__________________________________________________________________________________________________
activation_15 (Activation)      (None, 15, 15, 512)  0           add_4[0][0]                      
__________________________________________________________________________________________________
res_3_identity_2_a (Conv2D)     (None, 15, 15, 128)  65664       activation_15[0][0]              
__________________________________________________________________________________________________
bn_3_identity_2_a (BatchNormali (None, 15, 15, 128)  512         res_3_identity_2_a[0][0]         
__________________________________________________________________________________________________
activation_16 (Activation)      (None, 15, 15, 128)  0           bn_3_identity_2_a[0][0]          
__________________________________________________________________________________________________
res_3_identity_2_b (Conv2D)     (None, 15, 15, 128)  147584      activation_16[0][0]              
__________________________________________________________________________________________________
bn_3_identity_2_b (BatchNormali (None, 15, 15, 128)  512         res_3_identity_2_b[0][0]         
__________________________________________________________________________________________________
activation_17 (Activation)      (None, 15, 15, 128)  0           bn_3_identity_2_b[0][0]          
__________________________________________________________________________________________________
res_3_identity_2_c (Conv2D)     (None, 15, 15, 512)  66048       activation_17[0][0]              
__________________________________________________________________________________________________
bn_3_identity_2_c (BatchNormali (None, 15, 15, 512)  2048        res_3_identity_2_c[0][0]         
__________________________________________________________________________________________________
add_5 (Add)                     (None, 15, 15, 512)  0           bn_3_identity_2_c[0][0]          
                                                                 activation_15[0][0]              
__________________________________________________________________________________________________
activation_18 (Activation)      (None, 15, 15, 512)  0           add_5[0][0]                      
__________________________________________________________________________________________________
res_4_conv_a (Conv2D)           (None, 15, 15, 256)  131328      activation_18[0][0]              
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)  (None, 7, 7, 256)    0           res_4_conv_a[0][0]               
__________________________________________________________________________________________________
bn_4_conv_a (BatchNormalization (None, 7, 7, 256)    1024        max_pooling2d_5[0][0]            
__________________________________________________________________________________________________
activation_19 (Activation)      (None, 7, 7, 256)    0           bn_4_conv_a[0][0]                
__________________________________________________________________________________________________
res_4_conv_b (Conv2D)           (None, 7, 7, 256)    590080      activation_19[0][0]              
__________________________________________________________________________________________________
bn_4_conv_b (BatchNormalization (None, 7, 7, 256)    1024        res_4_conv_b[0][0]               
__________________________________________________________________________________________________
activation_20 (Activation)      (None, 7, 7, 256)    0           bn_4_conv_b[0][0]                
__________________________________________________________________________________________________
res_4_conv_copy (Conv2D)        (None, 15, 15, 1024) 525312      activation_18[0][0]              
__________________________________________________________________________________________________
res_4_conv_c (Conv2D)           (None, 7, 7, 1024)   263168      activation_20[0][0]              
__________________________________________________________________________________________________
max_pooling2d_6 (MaxPooling2D)  (None, 7, 7, 1024)   0           res_4_conv_copy[0][0]            
__________________________________________________________________________________________________
bn_4_conv_c (BatchNormalization (None, 7, 7, 1024)   4096        res_4_conv_c[0][0]               
__________________________________________________________________________________________________
bn_4_conv_copy (BatchNormalizat (None, 7, 7, 1024)   4096        max_pooling2d_6[0][0]            
__________________________________________________________________________________________________
add_6 (Add)                     (None, 7, 7, 1024)   0           bn_4_conv_c[0][0]                
                                                                 bn_4_conv_copy[0][0]             
__________________________________________________________________________________________________
activation_21 (Activation)      (None, 7, 7, 1024)   0           add_6[0][0]                      
__________________________________________________________________________________________________
res_4_identity_1_a (Conv2D)     (None, 7, 7, 256)    262400      activation_21[0][0]              
__________________________________________________________________________________________________
bn_4_identity_1_a (BatchNormali (None, 7, 7, 256)    1024        res_4_identity_1_a[0][0]         
__________________________________________________________________________________________________
activation_22 (Activation)      (None, 7, 7, 256)    0           bn_4_identity_1_a[0][0]          
__________________________________________________________________________________________________
res_4_identity_1_b (Conv2D)     (None, 7, 7, 256)    590080      activation_22[0][0]              
__________________________________________________________________________________________________
bn_4_identity_1_b (BatchNormali (None, 7, 7, 256)    1024        res_4_identity_1_b[0][0]         
__________________________________________________________________________________________________
activation_23 (Activation)      (None, 7, 7, 256)    0           bn_4_identity_1_b[0][0]          
__________________________________________________________________________________________________
res_4_identity_1_c (Conv2D)     (None, 7, 7, 1024)   263168      activation_23[0][0]              
__________________________________________________________________________________________________
bn_4_identity_1_c (BatchNormali (None, 7, 7, 1024)   4096        res_4_identity_1_c[0][0]         
__________________________________________________________________________________________________
add_7 (Add)                     (None, 7, 7, 1024)   0           bn_4_identity_1_c[0][0]          
                                                                 activation_21[0][0]              
__________________________________________________________________________________________________
activation_24 (Activation)      (None, 7, 7, 1024)   0           add_7[0][0]                      
__________________________________________________________________________________________________
res_4_identity_2_a (Conv2D)     (None, 7, 7, 256)    262400      activation_24[0][0]              
__________________________________________________________________________________________________
bn_4_identity_2_a (BatchNormali (None, 7, 7, 256)    1024        res_4_identity_2_a[0][0]         
__________________________________________________________________________________________________
activation_25 (Activation)      (None, 7, 7, 256)    0           bn_4_identity_2_a[0][0]          
__________________________________________________________________________________________________
res_4_identity_2_b (Conv2D)     (None, 7, 7, 256)    590080      activation_25[0][0]              
__________________________________________________________________________________________________
bn_4_identity_2_b (BatchNormali (None, 7, 7, 256)    1024        res_4_identity_2_b[0][0]         
__________________________________________________________________________________________________
activation_26 (Activation)      (None, 7, 7, 256)    0           bn_4_identity_2_b[0][0]          
__________________________________________________________________________________________________
res_4_identity_2_c (Conv2D)     (None, 7, 7, 1024)   263168      activation_26[0][0]              
__________________________________________________________________________________________________
bn_4_identity_2_c (BatchNormali (None, 7, 7, 1024)   4096        res_4_identity_2_c[0][0]         
__________________________________________________________________________________________________
add_8 (Add)                     (None, 7, 7, 1024)   0           bn_4_identity_2_c[0][0]          
                                                                 activation_24[0][0]              
__________________________________________________________________________________________________
activation_27 (Activation)      (None, 7, 7, 1024)   0           add_8[0][0]                      
__________________________________________________________________________________________________
res_5_conv_a (Conv2D)           (None, 7, 7, 512)    524800      activation_27[0][0]              
__________________________________________________________________________________________________
max_pooling2d_7 (MaxPooling2D)  (None, 3, 3, 512)    0           res_5_conv_a[0][0]               
__________________________________________________________________________________________________
bn_5_conv_a (BatchNormalization (None, 3, 3, 512)    2048        max_pooling2d_7[0][0]            
__________________________________________________________________________________________________
activation_28 (Activation)      (None, 3, 3, 512)    0           bn_5_conv_a[0][0]                
__________________________________________________________________________________________________
res_5_conv_b (Conv2D)           (None, 3, 3, 512)    2359808     activation_28[0][0]              
__________________________________________________________________________________________________
bn_5_conv_b (BatchNormalization (None, 3, 3, 512)    2048        res_5_conv_b[0][0]               
__________________________________________________________________________________________________
activation_29 (Activation)      (None, 3, 3, 512)    0           bn_5_conv_b[0][0]                
__________________________________________________________________________________________________
res_5_conv_copy (Conv2D)        (None, 7, 7, 2048)   2099200     activation_27[0][0]              
__________________________________________________________________________________________________
res_5_conv_c (Conv2D)           (None, 3, 3, 2048)   1050624     activation_29[0][0]              
__________________________________________________________________________________________________
max_pooling2d_8 (MaxPooling2D)  (None, 3, 3, 2048)   0           res_5_conv_copy[0][0]            
__________________________________________________________________________________________________
bn_5_conv_c (BatchNormalization (None, 3, 3, 2048)   8192        res_5_conv_c[0][0]               
__________________________________________________________________________________________________
bn_5_conv_copy (BatchNormalizat (None, 3, 3, 2048)   8192        max_pooling2d_8[0][0]            
__________________________________________________________________________________________________
add_9 (Add)                     (None, 3, 3, 2048)   0           bn_5_conv_c[0][0]                
                                                                 bn_5_conv_copy[0][0]             
__________________________________________________________________________________________________
activation_30 (Activation)      (None, 3, 3, 2048)   0           add_9[0][0]                      
__________________________________________________________________________________________________
res_5_identity_1_a (Conv2D)     (None, 3, 3, 512)    1049088     activation_30[0][0]              
__________________________________________________________________________________________________
bn_5_identity_1_a (BatchNormali (None, 3, 3, 512)    2048        res_5_identity_1_a[0][0]         
__________________________________________________________________________________________________
activation_31 (Activation)      (None, 3, 3, 512)    0           bn_5_identity_1_a[0][0]          
__________________________________________________________________________________________________
res_5_identity_1_b (Conv2D)     (None, 3, 3, 512)    2359808     activation_31[0][0]              
__________________________________________________________________________________________________
bn_5_identity_1_b (BatchNormali (None, 3, 3, 512)    2048        res_5_identity_1_b[0][0]         
__________________________________________________________________________________________________
activation_32 (Activation)      (None, 3, 3, 512)    0           bn_5_identity_1_b[0][0]          
__________________________________________________________________________________________________
res_5_identity_1_c (Conv2D)     (None, 3, 3, 2048)   1050624     activation_32[0][0]              
__________________________________________________________________________________________________
bn_5_identity_1_c (BatchNormali (None, 3, 3, 2048)   8192        res_5_identity_1_c[0][0]         
__________________________________________________________________________________________________
add_10 (Add)                    (None, 3, 3, 2048)   0           bn_5_identity_1_c[0][0]          
                                                                 activation_30[0][0]              
__________________________________________________________________________________________________
activation_33 (Activation)      (None, 3, 3, 2048)   0           add_10[0][0]                     
__________________________________________________________________________________________________
res_5_identity_2_a (Conv2D)     (None, 3, 3, 512)    1049088     activation_33[0][0]              
__________________________________________________________________________________________________
bn_5_identity_2_a (BatchNormali (None, 3, 3, 512)    2048        res_5_identity_2_a[0][0]         
__________________________________________________________________________________________________
activation_34 (Activation)      (None, 3, 3, 512)    0           bn_5_identity_2_a[0][0]          
__________________________________________________________________________________________________
res_5_identity_2_b (Conv2D)     (None, 3, 3, 512)    2359808     activation_34[0][0]              
__________________________________________________________________________________________________
bn_5_identity_2_b (BatchNormali (None, 3, 3, 512)    2048        res_5_identity_2_b[0][0]         
__________________________________________________________________________________________________
activation_35 (Activation)      (None, 3, 3, 512)    0           bn_5_identity_2_b[0][0]          
__________________________________________________________________________________________________
res_5_identity_2_c (Conv2D)     (None, 3, 3, 2048)   1050624     activation_35[0][0]              
__________________________________________________________________________________________________
bn_5_identity_2_c (BatchNormali (None, 3, 3, 2048)   8192        res_5_identity_2_c[0][0]         
__________________________________________________________________________________________________
add_11 (Add)                    (None, 3, 3, 2048)   0           bn_5_identity_2_c[0][0]          
                                                                 activation_33[0][0]              
__________________________________________________________________________________________________
activation_36 (Activation)      (None, 3, 3, 2048)   0           add_11[0][0]                     
__________________________________________________________________________________________________
Averagea_Pooling (AveragePoolin (None, 1, 1, 2048)   0           activation_36[0][0]              
__________________________________________________________________________________________________
flatten (Flatten)               (None, 2048)         0           Averagea_Pooling[0][0]           
__________________________________________________________________________________________________
dropout (Dropout)               (None, 2048)         0           flatten[0][0]                    
__________________________________________________________________________________________________
Dense_final (Dense)             (None, 6)            12294       dropout[0][0]                    
==================================================================================================
Total params: 19,952,262
Trainable params: 19,909,894
Non-trainable params: 42,368
__________________________________________________________________________________________________

Compile model

model.compile(
    optimizer = "adam", 
    loss = "categorical_crossentropy", 
    metrics = ["accuracy"]
)

Specify callbacks

earlystopping = EarlyStopping(
    monitor = 'loss', 
    mode = 'min', 
    verbose = 1, 
    patience = 15
)

# Save the best model with lower validation loss
checkpointer = ModelCheckpoint(
    filepath = "weights.hdf5", 
    verbose = 1, 
    save_best_only = True
)

Model training

# Here we use 1 epoch for demonstration.
history = model.fit_generator(
    train_generator, 
    steps_per_epoch = train_generator.n // 32, 
    epochs = 5,             
    callbacks = [
        checkpointer, 
        earlystopping
    ]
)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:1940: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.
  warnings.warn('`Model.fit_generator` is deprecated and '

Epoch 1/5
438/438 [==============================] - 180s 412ms/step - loss: 0.8367 - accuracy: 0.6858
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
Epoch 2/5
438/438 [==============================] - 180s 411ms/step - loss: 0.7321 - accuracy: 0.7325
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
Epoch 3/5
438/438 [==============================] - 180s 411ms/step - loss: 0.6380 - accuracy: 0.7719
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
Epoch 4/5
438/438 [==============================] - 181s 412ms/step - loss: 0.6049 - accuracy: 0.7787
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
Epoch 5/5
438/438 [==============================] - 180s 411ms/step - loss: 0.5641 - accuracy: 0.7992
WARNING:tensorflow:Can save best model only with val_loss available, skipping.

Model evaluation

evaluate = model.evaluate_generator(
    test_generator, 
    steps = test_generator.n // 32, 
    verbose = 1
)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:1973: UserWarning: `Model.evaluate_generator` is deprecated and will be removed in a future version. Please use `Model.evaluate`, which supports generators.
  warnings.warn('`Model.evaluate_generator` is deprecated and '

93/93 [==============================] - 7s 66ms/step - loss: 1.0486 - accuracy: 0.6116

labels = {
    0: 'buildings', 
    1: 'forest', 
    2: 'glacier', 
    3: 'mountain', 
    4: 'sea', 
    5: 'street'
}

prediction = []
original = []
image = []
count = 0
for i in os.listdir(path_test):
    for item in os.listdir(os.path.join(path_test, i)):
        # code to open the image
        img= PIL.Image.open(os.path.join(path_test, i, item))
        # resizing the image to (256,256)
        img = img.resize((256, 256))
        # appending image to the image list
        image.append(img)
        # converting image to array
        img = np.asarray(img, dtype = np.float32)
        # normalizing the image
        img = img / 255
        # reshaping the image into a 4D array
        img = img.reshape(-1, 256, 256, 3)
        # making prediction of the model
        predict = model.predict(img)
        # getting the index corresponding to the highest value in the prediction
        predict = np.argmax(predict)
        # appending the predicted class to the list
        prediction.append(labels[predict])
        # appending original class to the list
        original.append(i)

score = accuracy_score(original, prediction)
print("Test Accuracy : {}".format(score))

Test Accuracy : 0.6333333333333333

fig = plt.figure(figsize = (100,100))
for i in range(20):
    j = random.randint(0, len(image))
    fig.add_subplot(20, 1, i+1)
    plt.xlabel("Prediction: " + prediction[j] +"   Original: " + original[j])
    plt.imshow(image[j])
    
fig.tight_layout()
plt.show()

print(classification_report(np.asarray(prediction), np.asarray(original)))

# Based on these values, you can try t improve your model.
# For the sake of simplicity, hyperparameter tuning and model improvement was not done.

              precision    recall  f1-score   support

   buildings       0.79      0.68      0.73       510
      forest       0.85      0.93      0.89       434
     glacier       0.22      0.88      0.35       137
    mountain       0.25      0.73      0.38       183
         sea       0.94      0.41      0.57      1152
      street       0.84      0.72      0.77       584

    accuracy                           0.63      3000
   macro avg       0.65      0.72      0.62      3000
weighted avg       0.81      0.63      0.66      3000

plt.figure(figsize = (7, 5))
cm = confusion_matrix(np.asarray(prediction), np.asarray(original))
sns.heatmap(
    cm, 
    annot = True, 
    fmt = "d"
)
plt.show()

Grad Cam Visualization

def grad_cam(img):
    # Convert the image to array of type float32
    img = np.asarray(img, dtype = np.float32)

    # Reshape the image from (256,256,3) to (1,256,256,3)
    img = img.reshape(-1, 256, 256, 3)
    img_scaled = img / 255

    # Name of the average pooling layer and dense final (you can see these names in the model summary)
    classification_layers = ["Averagea_Pooling", "Dense_final"]

    # Last convolutional layer in the model
    final_conv = model.get_layer("res_5_identity_2_c")

    # Create a model with original model inputs and the last conv_layer as the output
    final_conv_model = keras.Model(model.inputs, final_conv.output)

    # Then we create the input for classification layer, which is the output of last conv layer
    # In our case, output produced by the conv layer is of the shape (1,3,3,2048) 
    # Since the classification input needs the features as input, we ignore the batch dimension

    classification_input = keras.Input(shape = final_conv.output.shape[1:])

    # We iterate through the classification layers, to get the final layer and then append 
    # the layer as the output layer to the classification model.
    temp = classification_input
    for layer in classification_layers:
        temp = model.get_layer(layer)(temp)
    
    classification_model = keras.Model(classification_input, temp)


    # We use gradient tape to monitor the 'final_conv_output' to retrive the gradients
    # corresponding to the predicted class
    with tf.GradientTape() as tape:
        # Pass the image through the base model and get the feature map 
        final_conv_output = final_conv_model(img_scaled)

        # Assign gradient tape to monitor the conv_output
        tape.watch(final_conv_output)
      
        # Pass the feature map through the classification model and use argmax to get the 
        # index of the predicted class and then use the index to get the value produced by final
        # layer for that class
        prediction = classification_model(final_conv_output)

        predicted_class = tf.argmax(prediction[0][0][0])

        predicted_class_value = prediction[:,:,:,predicted_class]
  
    # Get the gradient corresponding to the predicted class based on feature map.
    # which is of shape (1,3,3,2048)
    gradient = tape.gradient(predicted_class_value, final_conv_output)

    # Since we need the filter values (2048), we reduce the other dimensions, 
    # which would result in a shape of (2048,)
    gradient_channels = tf.reduce_mean(gradient, axis=(0, 1, 2))

    # We then convert the feature map produced by last conv layer(1,6,6,1536) to (6,6,1536)
    final_conv_output = final_conv_output.numpy()[0]

    gradient_channels = gradient_channels.numpy()

    # We multiply the filters in the feature map produced by final conv layer by the 
    # filter values that are used to get the predicted class. By doing this we inrease the
    # value of areas that helped in making the prediction and lower the vlaue of areas, that 
    # did not contribute towards the final prediction
    for i in range(gradient_channels.shape[-1]):
        final_conv_output[:, :, i] *= gradient_channels[i]

    # We take the mean accross the channels to get the feature map
    heatmap = np.mean(final_conv_output, axis=-1)

    # Normalizing the heat map between 0 and 1, to visualize it
    heatmap_normalized = np.maximum(heatmap, 0) / np.max(heatmap)

    # Rescaling and converting the type to int
    heatmap = np.uint8(255 * heatmap_normalized )

    # Create the colormap
    color_map = plt.cm.get_cmap('jet')

    # get only the rb features from the heatmap
    color_map = color_map(np.arange(256))[:, :3]
    heatmap = color_map[heatmap]

    # convert the array to image, resize the image and then convert to array
    heatmap = keras.preprocessing.image.array_to_img(heatmap)
    heatmap = heatmap.resize((256, 256))
    heatmap = np.asarray(heatmap, dtype = np.float32)

    # Add the heatmap on top of the original image
    final_img = heatmap * 0.4 + img[0]
    final_img = keras.preprocessing.image.array_to_img(final_img)

    return final_img, heatmap_normalized

fig, axs = plt.subplots(6,3, figsize = (16,32))
count = 0
for _ in range(6):
    i = random.randint(0, len(image))
    gradcam, heatmap = grad_cam(image[i])
    axs[count][0].title.set_text("Original -" + original[i])
    axs[count][0].imshow(image[i])
    axs[count][1].title.set_text("Heatmap") 
    axs[count][1].imshow(heatmap)
    axs[count][2].title.set_text("Prediction -" + prediction[i]) 
    axs[count][2].imshow(gradcam)  
    count += 1

fig.tight_layout()

Future Work

To implement and visualize class activation map with Grad-CAM ++ and Score-CAM

- Grad-CAM++ https://arxiv.org/abs/1710.11063
- Score-CAM https://arxiv.org/abs/1910.01279