Getting started

Installation and Setup

Installation

The latest built version of MRCpy can be installed using pip as ` pip install MRCpy `

Alternatively, the latest code (under development) of MRCpy can be installed by downloading the source GitHub repository as ` git clone https://github.com/MachineLearningBCAM/MRCpy.git cd MRCpy python3 setup.py install ` You may then use pytest tests to run all the checks (you will need to have the pytest package installed).

Dependencies

  • Python \(\geq\) 3.6

  • numpy \(==\) 1.25.0

  • scipy\(==\) 1.10.0

  • scikit-learn \(==\) 1.2.2

  • cvxpy \(==\) 1.3.1

  • pandas \(==\) 2.2.0

  • mosek

  • pyarrow

  • gurobipy

Quick start

This example loads the mammographic dataset, and trains the MRC classifier using 0-1 loss (i.e., the default loss).

from MRCpy import MRC
from MRCpy.datasets import load_mammographic
from sklearn.model_selection import train_test_split

# Load the mammographic dataset
X, Y = load_mammographic(with_info=False)

# Split the data into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Create the MRC classifier using default loss (0-1)
clf = MRC()

# Fit the classifier on the training data
clf.fit(X_train, y_train)

# Bounds on the classification error (only for MRC)
lower_error = clf.get_lower_bound()
upper_error = clf.get_upper_bound()

# Compute the accuracy on the test set
accuracy = clf.score(X_test, y_test)

Dataset Loaders

MRCpy library incorporates a variety of datasets, along with descriptions and convenient loader functions for each dataset. Next, we show the description of the functions you can find and import from MRCpy.datasets. Note that the datasets adult, magic, mnist, cats vs dogs and yearbook are not available in the built version (installed through pip) of MRCpy.

normalizeLabels(origY)

Normalize the labels of the instances in the range 0,…, r-1 for r classes.

load_adult(with_info=False)

Load and return the adult incomes prediction dataset (classification).

Classes

2

Samples per class

[37155,11687]

Samples total

48882

Dimensionality

14

Features

int, positive

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.

(data, target) : tuple if with_info is False

load_diabetes(with_info=False)

Load and return the Pima Indians Diabetes dataset (classification).

Classes

2

Samples per class

[500,168]

Samples total

668

Dimensionality

8

Features

int, float, positive

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.

(data, target) : tuple if with_info is False

load_iris(with_info=False)

Load and return the Iris Plants Dataset (classification).

Classes

3

Samples per class

[50,50,50]

Samples total

150

Dimensionality

4

Features

int, float, positive

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.

(data, target) : tuple if with_info is False

load_redwine(with_info=False)

“””Load and return the Red Wine Dataset (classification).

Classes

10

Samples per class

[1599, 4898]

Samples total

6497

Dimensionality

11

Features

int, float, positive

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

bunchBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.

(data, target) : tuple if with_info is False

“””

load_credit(with_info=False)

Load and return the Credit Approval prediction dataset (classification).

Classes

2

Samples total

690

Dimensionality

15

Features

int, float, positive

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of adult csv dataset.

(data, target) : tuple if with_info is False

load_magic(with_info=False)

Load and return the Magic Gamma Telescope dataset (classification).

Classes

2

Samples per class

[6688,12332]

Samples total

19020

Dimensionality

10

Features

float

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of adult csv dataset.

(data, target) : tuple if with_info is False

load_haberman(with_info=False)

Load and return the Haberman’s Survival Data Set (classification).

Classes

2

Samples per class

[225, 82]

Samples total

306

Dimensionality

3

Features

int

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of haberman csv dataset.

(data, target) : tuple if with_info is False

load_mammographic(with_info=False)

Load and return the Mammographic Mass Data Set (classification).

Classes

2

Samples per class

[516, 445]

Samples total

961

Dimensionality

5

Features

int

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of mammographic csv dataset.

(data, target) : tuple if with_info is True

load_indian_liver(with_info=False)

Load and return the Indian Liver Patient Data Set (classification).

Classes

2

Samples per class

[416, 167]

Samples total

583

Dimensionality

10

Features

int, float

Missing Values

4 (nan)

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of indian liver csv dataset.

(data, target) : tuple if with_info is False

load_mnist_features_resnet18(with_info=False, split=False)

Load and return the MNIST Data Set features extracted using a pretrained ResNet18 neural network (classification).

Classes

2

Samples per class Train

[5923,6742,5958,6131,5842,5421,5918,6265,5851,5949]

Samples per class Test

[980,1135,1032,1010,982,892,958,1028,974,1009]

Samples total Train

60000

Samples total Test

10000

Samples total

70000

Dimensionality

512

Features

float

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

splitboolean, default=False.

If True, returns a dictionary instead of an array in the place of the data.

Returns

bunchBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of MNIST ResNet18 features csv dataset. If split=False, data is an array. If split=True data is a dictionary with ‘train’ and ‘test’ splits.

(data, target)tuple if with_info is False. If split=False, data is

an array. If split=True data is a dictionary with ‘train’ and ‘test’ splits.

load_catsvsdogs_features_resnet18(with_info=False)

Load and return the Cats vs Dogs Data Set features extracted using a pretrained ResNet18 neural network (classification).

Classes

2

Samples per class

[11658,11604]

Samples total

23262

Dimensionality

512

Features

float

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

bunchBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of Cats vs Dogs ResNet18 features csv dataset.

(data, target) : tuple if with_info is False

load_yearbook_features_resnet18(with_info=False, with_attributes=False)

Load and return the Yearbook Data Set features extracted using a pretrained ResNet18 neural network (classification).

Classes

2

Samples per class

[20248,17673]

Samples total

37921

Dimensionality

512

Features

float

Parameters

with_infoboolean, default=False.

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

with_attributesboolean, default=False.

If True, returns an additional dictionary containing information of additional attributes: year, state, city, school of the portraits. The key ‘attr_labels’ in the dictionary contains these labels corresponding to each columns, while ‘attr_data’ corresponds to the attribute data in form of numpy array.

Returns

bunchBunch

Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of Yearbook ResNet18 features csv dataset.

(data, target) : tuple if with_info is False