MRCs with Deep Neural Networks: Part I

In this example we will use a pretrained neural network to extract features of images in a dataset to train and test MRCs with these features in MRCs with Deep Neural Networks: Part II.

We are using ResNet18 pretrained model implementation in Pytorch library. Resnet models were proposed in “Deep Residual Learning for Image Recognition”. Here we are using the version ResNet18 which contains 18 layers and it is pretrained over ImageNet dataset which has 1000 different classes.

See also

For more information about ResNet models refer to

[1] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

Introduction to Pretrained models, Transfer Learning and Feature Extraction

Deep convolutional neural network models may take days or even weeks to train on very large datasets. A way to short-cut this process is to re-use the model weights from pre-trained models that were developed for standard computer vision benchmark datasets, such as the ImageNet image recognition tasks. Top performing models can be downloaded and used directly, or integrated into a new model for your own computer vision problems. Transfer learning generally refers to a process where a model trained on one problem is used in some way on a second related problem. Alternately, the pretrained models may be used as feature extraction models. Here, the output of the model from a layer prior to the output layer of the model is used as input to a new classifier model.

Load pretrained model and preprocess images

Firstly, we load the pretrained model torchvision.models.ResNet18 and we take out the last layer in order to obtain the features. We call this feature extraction model resnet18_features.

In the next lines we use torchvision.transforms.Compose to compose several transforms together. In line [2] the input is resized to match its smaller edge to the given size, 256. That is, the image is resized mantaining the aspect ratio and min(height,width)=256. In line [3] we use the function CenterCrop to crop the given image at the center to (224,224). If image size is smaller than output size along any edge, image is padded with 0 and then center cropped.

If you use smaller images, the kernels might not be able to extract the features with the usual size, since they are smaller (ore larger), which may result in a difference in performance.

Function in line [4] converts a PIL Image or numpy.ndarray to tensor. Finally Normalize function (lines [5,6]) normalizes a tensor image with mean and standard deviation required by ResNet18 method (check more in pytorch ResNet18 doc). Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n channels, this transform will normalize each channel i.e., output[channel] = (input[channel] - mean[channel]) / std[channel] You can check more about torchvision.transforms in pytorch docummentation.

import os
from os.path import join

import numpy as np
import tensorflow_datasets as tfds
import torch
import torch.nn as nn
import torchvision.models as models
from PIL import Image
from torchvision import transforms

from MRCpy.datasets import load_yearbook_path

resnet18 = models.resnet18(pretrained=True)
features_resnet18 = nn.Sequential(*(list(resnet18.children())[:-1]))
features_resnet18.eval()

transform = transforms.Compose(                              # [1]
    [transforms.Resize(256),                                 # [2]
     transforms.CenterCrop(224),                             # [3]
     transforms.ToTensor(),                                  # [4]
     transforms.Normalize(mean=[0.485, 0.456, 0.406],        # [5]
                          std=[0.229, 0.224, 0.225])])       # [6]

Using tensorflow datasets: MNIST & Cats vs Dogs

MNIST

The MNIST database of handwritten digits, available from this page, has a training set of 60000 examples, and a test set of 10000 examples. All images have dimension (28,28,1) and they are greyscale. Tensorflow provides with a convenient function to directly load this dataset into the scope without the need of downloading and storing the dataset locally, you can check more in tensorflow documentation. It already provides with the train and test partitions. We load the dataset with the function tensorflow_datasets.load and we specify as_supervised=True to indicate that we want to load the labels together with the images and with_info=True will return the tuple (tf.data.Dataset, tfds.core.DatasetInfo), the latter containing the info associated with the builder.

[[ds_train, ds_test], ds_info] = tfds.load('mnist', split=['train', 'test'],
                                           as_supervised=True, with_info=True)

df_train = tfds.as_dataframe(ds_train, ds_info)
df_test = tfds.as_dataframe(ds_test, ds_info)

images_train = df_train['image'].to_numpy()
Y_train = df_train['label'].to_numpy()
images_test = df_test['image'].to_numpy()
Y_test = df_test['label'].to_numpy()

X_train = []
X_test = []


for img_array in images_train:
    # We convert the gray scale into RGB because it is what the model expect
    img_array = np.repeat(img_array, 3, axis=-1)
    img = Image.fromarray(img_array, mode='RGB').resize((224, 224))
    img_t = transform(img)
    batch_t = torch.unsqueeze(img_t, 0)
    X_train.append(features_resnet18(batch_t).detach().numpy().flatten())

for img_array in images_test:
    # We convert the gray scale into RGB because it is what the model expect
    img_array = np.repeat(img_array, 3, axis=-1)
    img = Image.fromarray(img_array, mode='RGB').resize((224, 224))
    img_t = transform(img)
    batch_t = torch.unsqueeze(img_t, 0)
    X_test.append(features_resnet18(batch_t).detach().numpy().flatten())

mnist_features_resnet18_train = np.concatenate(
    (X_train, np.reshape(Y_train, (-1, 1))), axis=1)

mnist_features_resnet18_test = np.concatenate(
    (X_test, np.reshape(Y_test, (-1, 1))), axis=1)

np.savetxt('mnist_features_resnet18_train.csv', mnist_features_resnet18_train,
           delimiter=',')
np.savetxt('mnist_features_resnet18_test.csv', mnist_features_resnet18_test,
           delimiter=',')

Cats vs Dogs

Cats vs dogs dataset is a database of 23262 RGB cats and dogs images released by Microsoft for the Asirra captcha (homepage). Cats are labeled by 0 and dogs by 1 and there are 11658 and 11604 images of each class, respectively. It is available in tensorflow datasets, you can check the details here.

[ds, ds_info] = tfds.load('cats_vs_dogs', split='train',
                          as_supervised=True, with_info=True)

df = tfds.as_dataframe(ds, ds_info)
images = df['image'].to_numpy()
labels = df['label'].to_numpy()

X_features = []
for img_array in images:
    img = Image.fromarray(img_array, mode='RGB')
    img_t = transform(img)
    batch_t = torch.unsqueeze(img_t, 0)
    X_features.append(features_resnet18(batch_t).detach().numpy().flatten())

catsvsdogs_features_resnet18 = np.concatenate((X_features,
                                               np.reshape(labels, (-1, 1))),
                                              axis=1)

np.savetxt('catsvsdogs_features_resnet18.csv', catsvsdogs_features_resnet18,
           delimiter=',')

Using a local dataset: Yearbook Dataset

In this example, we are going to extract the features from a local dataset. We will be using the Yearbook dataset which is a publicly-available dataset of 37,921 frontal-facing American high school yearbook portraits taken from 1905 to 2013 labeled by gender. We will consider binary classification labels identifying whether the person on the image is a man or a woman.

See also

More information about Yearbook dataset can be found in

[1] Ginosar, S., Rakelly, K., Sachs, S., Yin, B., & Efros, A. A. (2015). A century of portraits: A visual historical record of american high school yearbooks. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 1-7).

[2] Kumar, A., Ma, T., & Liang, P. (2020, November). Understanding self-training for gradual domain adaptation. In International Conference on Machine Learning (pp. 5468-5479). PMLR.

We take paths and names from images from F (female) and M (male) folder and merge them in a dataset ordered by date (as images name start by the year of the photo). We convert the labels into 0 for F and 1 for M.

data_path = load_yearbook_path()
F_path = join(data_path, 'F')
F = os.listdir(F_path)
F = np.concatenate((np.reshape(F, (len(F), 1)), np.zeros((len(F), 1)),
                    np.reshape([F_path + x for x in F], (len(F), 1))), axis=1)
M_path = join(data_path, 'M')
M = os.listdir(M_path)
M = np.concatenate((np.reshape(M, (len(M), 1)), np.ones((len(M), 1)),
                    np.reshape([M_path + x for x in M], (len(M), 1))), axis=1)
data = np.concatenate((F, M), axis=0)
data = data[np.argsort(data[:, 0])]

paths = data[:, 2]
Y = data[:, 1]

Next, we load the images, transform them using the function transform we defined above to make the image compatible with ResNet18. Lastly, we extract the image features using features_resnet18() and we transform the output features to a flat array that will be a new instance of our feature dataset. We store this feature dataset extracted with resnet18 in a csv file that is available in the dataset folder of the MRCpy library.

X_features = []

for img_path in paths:
    img = Image.open(img_path)
    img_t = transform(img)
    batch_t = torch.unsqueeze(img_t, 0)
    X_features.append(features_resnet18(batch_t).detach().numpy().flatten())

yearbook_features_resnet18 = np.concatenate((X_features,
                                             np.reshape(Y, (-1, 1))), axis=1)
np.savetxt('yearbook_features_resnet18.csv',
           yearbook_features_resnet18, delimiter=',')

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery