Hyperparameter Tuning: Upper Bound vs Cross-Validation

Example of how to use the Upper Bounds provided by the MRC method in the MRCpy library for hyperparameter tuning and comparison to Cross-Validation. We will see that using the Upper Bound gets similar performances to Cross-Validation but being four times faster.

We are using ‘0-1’ loss and RandomFourierPhi map (phi='fourier'). We are going to tune the regularization parameter s of the feature mapping using a random grid. We will used the usual method RandomizedSearchCV from scikit-learn.

In the following example, we will use the nesterov subgradient solver for the MRC classifier by setting the parameter solver = 'subgrad'.

# Import needed modules
import random
import time

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.spatial import distance
from sklearn import preprocessing
from sklearn.model_selection import RandomizedSearchCV, train_test_split

from MRCpy import MRC
from MRCpy.datasets import *

Random Grid using Upper Bound parameter

We select random n_iter random values for the parameter to tune in a given range and select the parameter which minimizes the upper bound provided by the MRC method. On each repetition we calculate and store the upper bound for each possible value of s. We are selecting n_iter = 10 in the following code because it is the default value for the RandomGridCV method.

def run_RandomGridUpper(X_train, Y_train, X_test, Y_test,
                    s_ini, s_fin, index):
    n_iter = 10
    startTime = time.time()
    s_id = [(s_fin - s_ini) * random.random() + s_ini for i in range(n_iter)]
    upps = np.zeros(n_iter)

    for i in range(n_iter):
        clf = MRC(phi='fourier', s=s_id[i], random_state=0,
                  deterministic=False, solver='subgrad')
        clf.fit(X_train, Y_train)
        upps[i] = clf.get_upper_bound()

    min_upp = np.min(upps)
    best_s = s_id[np.argmin(upps)]
    clf = MRC(phi='fourier', s=best_s, random_state=0,
              deterministic=False, solver='subgrad')
    clf.fit(X_train, Y_train)
    Y_pred = clf.predict(X_test)
    best_err = np.average(Y_pred != Y_test)
    totalTime = time.time() - startTime

    return pd.DataFrame({'upper': [min_upp], 's': best_s,
            'time': totalTime, 'error': best_err})

RandomGridCV

def run_RandomGridCV(X_train, Y_train, X_test, Y_test,
                 s_ini, s_fin, index):
    n_iter = 10
    startTime = time.time()
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25,
                                                        random_state=rep)
    # Normalizing the data
    std_scale = preprocessing.StandardScaler().fit(X_train, Y_train)
    X_train = std_scale.transform(X_train)
    X_test = std_scale.transform(X_test)

    s_values = np.linspace(s_ini, s_fin, num=5000)
    param = {'s': s_values}

    mrc = MRC(phi='fourier', random_state=0, deterministic=False,
              solver='subgrad')
    clf = RandomizedSearchCV(mrc, param, random_state=0, n_iter=n_iter)
    clf.fit(X_train, Y_train)
    Y_pred = clf.predict(X_test)
    error = np.average(Y_pred != Y_test)

    totalTime = time.time() - startTime

    return pd.DataFrame({'upper': [clf.best_estimator_.get_upper_bound()],
            's': clf.best_estimator_.s,
            'time': totalTime, 'error': error})

Comparison

We are performing both of the previous methods for hyperparameter tuning over a set of different datasets and comparing the performances. Before calling them, we set a range of values for the hyperpatameter. Empirical knowledge tells us that best values for s lie in between 0.3 and 0.6.

We repeat these processes several times to make sure performances do not rely heavily on the train_test_split selected.

def plot_table(df, title, color):
    fig, ax = plt.subplots()
    # hide axes
    fig.patch.set_visible(False)
    ax.axis('off')
    ax.axis('tight')
    t = ax.table(cellText=df.values, colLabels=df.columns, loc='center',
                 colColours=color, cellColours=[color] * len(df))
    t.auto_set_font_size(False)
    t.set_fontsize(8)
    t.auto_set_column_width(col=list(range(len(df.columns))))
    fig.tight_layout()
    plt.title(title)
    plt.show()


loaders = [load_mammographic, load_haberman, load_indian_liver,
           load_diabetes, load_credit]
dataNameList = ["mammographic", "haberman", "indian_liver",
                "diabetes", "credit"]

dfCV = pd.DataFrame()
dfUpper = pd.DataFrame()
f = '%1.3g'  # format
for j, load in enumerate(loaders):

    # Loading the dataset
    X, Y = load()
    dataName = dataNameList[j]

    # In order to avoid the possible bias made by the choice of the train-test
    # split, we do this process several (20) times and average the
    # obtained results
    dfCV_aux = pd.DataFrame()
    dfUpper_aux = pd.DataFrame()
    for rep in range(10):
        X_train, X_test, Y_train, Y_test = \
            train_test_split(X, Y, test_size=0.25, random_state=rep)
        # Normalizing the data
        std_scale = preprocessing.StandardScaler().fit(X_train, Y_train)
        X_train = std_scale.transform(X_train)
        X_test = std_scale.transform(X_test)

        s_ini = 0.3
        s_fin = 0.6

        # We tune the parameters using both method and store the results
        dfCV_aux = pd.concat([dfCV_aux,
            run_RandomGridCV(X_train, Y_train, X_test, Y_test,
                             s_ini, s_fin, rep)], ignore_index=True)
        dfUpper_aux = pd.concat([dfUpper_aux,
            run_RandomGridUpper(X_train, Y_train, X_test, Y_test,
                                s_ini, s_fin, rep)], ignore_index=True)

    # We save the mean results of the 20 repetitions
    mean_err = f % np.mean(dfCV_aux['error']) + ' ± ' + \
        f % np.std(dfCV_aux['error'])
    mean_s = f % np.mean(dfCV_aux['s']) + ' ± ' + f % np.std(dfCV_aux['s'])
    mean_time = f % np.mean(dfCV_aux['time']) + ' ± ' + \
        f % np.std(dfCV_aux['time'])
    mean_upper = f % np.mean(dfCV_aux['upper']) + ' ± ' + \
        f % np.std(dfCV_aux['upper'])
    dfCV = pd.concat([dfCV, pd.DataFrame({'dataset': [dataName], 'error': mean_err,
                        's': mean_s,
                        'upper': mean_upper,
                        'time': mean_time})], ignore_index=True)
    mean_err = f % np.mean(dfUpper_aux['error']) + ' ± ' + \
        f % np.std(dfUpper_aux['error'])
    mean_s = f % np.mean(dfUpper_aux['s']) + ' ± ' + \
        f % np.std(dfUpper_aux['s'])
    mean_time = f % np.mean(dfUpper_aux['time']) + ' ± ' + \
        f % np.std(dfUpper_aux['time'])
    mean_upper = f % np.mean(dfUpper_aux['upper']) + ' ± ' + \
        f % np.std(dfUpper_aux['upper'])
    dfUpper = pd.concat([dfUpper, pd.DataFrame({'dataset': [dataName], 'error': mean_err,
                              's': mean_s,
                              'upper': mean_upper,
                              'time': mean_time})], ignore_index=True)
dfCV.style.set_caption('RandomGridCV Results').set_properties(
    **{'background-color': 'lightskyblue'}, subset=['error', 'time'])
RandomGridCV Results
  dataset error s upper time
0 mammographic 0.212 ± 0.024 0.433 ± 0.0689 0.227 ± 0.0126 36.7 ± 0.365
1 haberman 0.274 ± 0.0481 0.531 ± 0.0696 0.271 ± 0.0162 30.4 ± 0.277
2 indian_liver 0.288 ± 0.0179 0.44 ± 0.0487 0.296 ± 0.00526 37.6 ± 0.422
3 diabetes 0.277 ± 0.0302 0.48 ± 0.0942 0.288 ± 0.007 37.2 ± 0.703
4 credit 0.205 ± 0.0247 0.512 ± 0.0759 0.2 ± 0.00776 34.5 ± 0.157


dfUpper.style.set_caption('RandomGridUpper Results').set_properties(
    **{'background-color': 'lightskyblue'}, subset=['error', 'time'])
RandomGridUpper Results
  dataset error s upper time
0 mammographic 0.216 ± 0.0222 0.329 ± 0.025 0.224 ± 0.0125 8.28 ± 0.103
1 haberman 0.283 ± 0.0502 0.338 ± 0.0262 0.261 ± 0.0153 6.87 ± 0.105
2 indian_liver 0.288 ± 0.0185 0.339 ± 0.028 0.293 ± 0.00625 8.71 ± 0.175
3 diabetes 0.294 ± 0.0401 0.337 ± 0.035 0.281 ± 0.00758 8.53 ± 0.0622
4 credit 0.199 ± 0.0287 0.331 ± 0.0286 0.188 ± 0.00827 8.04 ± 0.0954


Results

Comparing the resulting tables above we notice that both methods: RandomGridCV and Random Grid using Upper bounds are really similar in performance, one can do better than the other depending on the datasets but have overall the same error range.

Furthermore we can see how using the Upper bounds results in a great improvement in the running time being around 4 times quicker than the usual RandomGrid method.

We note that in every dataset the optimum value for the parameter s seems to be always around 0.3, that is why this value has been chosen to be the default value for the library.

Total running time of the script: ( 41 minutes 40.407 seconds)

Gallery generated by Sphinx-Gallery