Getting started

Installation and Setup

Installation

The latest built version of MRCpy can be installed using pip as ` pip install MRCpy `

Alternatively, the latest code (under development) of MRCpy can be installed by downloading the source GitHub repository as ` git clone https://github.com/MachineLearningBCAM/MRCpy.git cd MRCpy python3 setup.py install ` You may then use pytest tests to run all the checks (you will need to have the pytest package installed).

Dependencies

Python Runtime Services \(\geq\) 3.6
numpy \(==\) 1.25.0
scipy\(==\) 1.10.0
scikit-learn \(==\) 1.2.2
cvxpy \(==\) 1.3.1
pandas \(==\) 2.2.0
mosek
pyarrow
gurobipy

Quick start

This example loads the mammographic dataset, and trains the MRC classifier using 0-1 loss (i.e., the default loss).

from MRCpy import MRC
from MRCpy.datasets import load_mammographic
from sklearn.model_selection import train_test_split

# Load the mammographic dataset
X, Y = load_mammographic(with_info=False)

# Split the data into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Create the MRC classifier using default loss (0-1)
clf = MRC()

# Fit the classifier on the training data
clf.fit(X_train, y_train)

# Bounds on the classification error (only for MRC)
lower_error = clf.get_lower_bound()
upper_error = clf.get_upper_bound()

# Compute the accuracy on the test set
accuracy = clf.score(X_test, y_test)

Dataset Loaders

MRCpy library incorporates a variety of datasets, along with descriptions and convenient loader functions for each dataset. Next, we show the description of the functions you can find and import from MRCpy.datasets. Note that the datasets adult, magic, mnist, cats vs dogs and yearbook are not available in the built version (installed through pip) of MRCpy.

`normalizeLabels(origY)`

Normalize the labels of the instances in the range 0,…, r-1 for r classes.

`load_adult(with_info=False)`

Load and return the adult incomes prediction dataset (classification).

Classes	2
Samples per class	[37155,11687]
Samples total	48882
Dimensionality	14
Features	int, positive

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.

(data, target) : tuple if with_info is False

`load_diabetes(with_info=False)`

Load and return the Pima Indians Diabetes dataset (classification).

Classes	2
Samples per class	[500,168]
Samples total	668
Dimensionality	8
Features	int, float, positive

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.

(data, target) : tuple if with_info is False

`load_iris(with_info=False)`

Load and return the Iris Plants Dataset (classification).

Classes	3
Samples per class	[50,50,50]
Samples total	150
Dimensionality	4
Features	int, float, positive

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.

(data, target) : tuple if with_info is False

`load_redwine(with_info=False)`

“””Load and return the Red Wine Dataset (classification).

Classes	10
Samples per class	[1599, 4898]
Samples total	6497
Dimensionality	11
Features	int, float, positive

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

bunchBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.

(data, target) : tuple if with_info is False

“””

`load_credit(with_info=False)`

Load and return the Credit Approval prediction dataset (classification).

Classes	2
Samples total	690
Dimensionality	15
Features	int, float, positive

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of adult csv dataset.

(data, target) : tuple if with_info is False

`load_magic(with_info=False)`

Load and return the Magic Gamma Telescope dataset (classification).

Classes	2
Samples per class	[6688,12332]
Samples total	19020
Dimensionality	10
Features	float

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of adult csv dataset.

(data, target) : tuple if with_info is False

`load_haberman(with_info=False)`

Load and return the Haberman’s Survival Data Set (classification).

Classes	2
Samples per class	[225, 82]
Samples total	306
Dimensionality	3
Features	int

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of haberman csv dataset.

(data, target) : tuple if with_info is False

`load_mammographic(with_info=False)`

Load and return the Mammographic Mass Data Set (classification).

Classes	2
Samples per class	[516, 445]
Samples total	961
Dimensionality	5
Features	int

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of mammographic csv dataset.

(data, target) : tuple if with_info is True

`load_indian_liver(with_info=False)`

Load and return the Indian Liver Patient Data Set (classification).

Classes	2
Samples per class	[416, 167]
Samples total	583
Dimensionality	10
Features	int, float
Missing Values	4 (nan)

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

dataBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of indian liver csv dataset.

(data, target) : tuple if with_info is False

`load_mnist_features_resnet18(with_info=False, split=False)`

Load and return the MNIST Data Set features extracted using a pretrained ResNet18 neural network (classification).

Classes	2
Samples per class Train	[5923,6742,5958,6131,5842,5421,5918,6265,5851,5949]
Samples per class Test	[980,1135,1032,1010,982,892,958,1028,974,1009]
Samples total Train	60000
Samples total Test	10000
Samples total	70000
Dimensionality	512
Features	float

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
splitboolean, default=False.: If True, returns a dictionary instead of an array in the place of the data.

Returns

bunchBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of MNIST ResNet18 features csv dataset. If split=False, data is an array. If split=True data is a dictionary with ‘train’ and ‘test’ splits.
(data, target)tuple if with_info is False. If split=False, data is: an array. If split=True data is a dictionary with ‘train’ and ‘test’ splits.

`load_catsvsdogs_features_resnet18(with_info=False)`

Load and return the Cats vs Dogs Data Set features extracted using a pretrained ResNet18 neural network (classification).

Classes	2
Samples per class	[11658,11604]
Samples total	23262
Dimensionality	512
Features	float

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

bunchBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of Cats vs Dogs ResNet18 features csv dataset.

(data, target) : tuple if with_info is False

`load_yearbook_features_resnet18(with_info=False, with_attributes=False)`

Load and return the Yearbook Data Set features extracted using a pretrained ResNet18 neural network (classification).

Classes	2
Samples per class	[20248,17673]
Samples total	37921
Dimensionality	512
Features	float

Parameters

with_infoboolean, default=False.: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
with_attributesboolean, default=False.: If True, returns an additional dictionary containing information of additional attributes: year, state, city, school of the portraits. The key ‘attr_labels’ in the dictionary contains these labels corresponding to each columns, while ‘attr_data’ corresponds to the attribute data in form of numpy array.

Returns

bunchBunch: Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of Yearbook ResNet18 features csv dataset.

(data, target) : tuple if with_info is False

Getting started

Installation and Setup

Quick start

Dataset Loaders

normalizeLabels(origY)

load_adult(with_info=False)

load_diabetes(with_info=False)

load_iris(with_info=False)

load_redwine(with_info=False)

Parameters

Returns

load_credit(with_info=False)

load_magic(with_info=False)

load_haberman(with_info=False)

load_mammographic(with_info=False)

load_indian_liver(with_info=False)

load_mnist_features_resnet18(with_info=False, split=False)

load_catsvsdogs_features_resnet18(with_info=False)

load_yearbook_features_resnet18(with_info=False, with_attributes=False)

`normalizeLabels(origY)`

`load_adult(with_info=False)`

`load_diabetes(with_info=False)`

`load_iris(with_info=False)`

`load_redwine(with_info=False)`

`load_credit(with_info=False)`

`load_magic(with_info=False)`

`load_haberman(with_info=False)`

`load_mammographic(with_info=False)`

`load_indian_liver(with_info=False)`

`load_mnist_features_resnet18(with_info=False, split=False)`

`load_catsvsdogs_features_resnet18(with_info=False)`

`load_yearbook_features_resnet18(with_info=False, with_attributes=False)`