Getting started
Installation and Setup
Installation
The latest built version of MRCpy
can be installed using pip
as
`
pip install MRCpy
`
Alternatively, the latest code (under development) of MRCpy can be installed by downloading the source GitHub repository as
`
git clone https://github.com/MachineLearningBCAM/MRCpy.git
cd MRCpy
python3 setup.py install
`
You may then use pytest tests
to run all the checks (you will need to have the pytest
package installed).
Dependencies
Quick start
This example loads the mammographic dataset, and trains the MRC
classifier
using 0-1 loss (i.e., the default loss).
from MRCpy import MRC
from MRCpy.datasets import load_mammographic
from sklearn.model_selection import train_test_split
# Load the mammographic dataset
X, Y = load_mammographic(with_info=False)
# Split the data into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
# Create the MRC classifier using default loss (0-1)
clf = MRC()
# Fit the classifier on the training data
clf.fit(X_train, y_train)
# Bounds on the classification error (only for MRC)
lower_error = clf.get_lower_bound()
upper_error = clf.get_upper_bound()
# Compute the accuracy on the test set
accuracy = clf.score(X_test, y_test)
Dataset Loaders
MRCpy library incorporates a variety of datasets, along with descriptions and convenient loader functions for each dataset. Next, we show the description of the functions you can find and import from MRCpy.datasets
. Note that the datasets adult
, magic
, mnist
, cats vs dogs
and yearbook
are not available in the built version (installed through pip) of MRCpy
.
normalizeLabels(origY)
Normalize the labels of the instances in the range 0,…, r-1 for r classes.
load_adult(with_info=False)
Load and return the adult incomes prediction dataset (classification).
Classes |
2 |
Samples per class |
[37155,11687] |
Samples total |
48882 |
Dimensionality |
14 |
Features |
int, positive |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.
Returns
- dataBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.
(data, target) : tuple if with_info
is False
load_diabetes(with_info=False)
Load and return the Pima Indians Diabetes dataset (classification).
Classes |
2 |
Samples per class |
[500,168] |
Samples total |
668 |
Dimensionality |
8 |
Features |
int, float, positive |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.
Returns
- dataBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.
(data, target) : tuple if with_info
is False
load_iris(with_info=False)
Load and return the Iris Plants Dataset (classification).
Classes |
3 |
Samples per class |
[50,50,50] |
Samples total |
150 |
Dimensionality |
4 |
Features |
int, float, positive |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.
Returns
- dataBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.
(data, target) : tuple if with_info
is False
load_redwine(with_info=False)
“””Load and return the Red Wine Dataset (classification).
Classes |
10 |
Samples per class |
[1599, 4898] |
Samples total |
6497 |
Dimensionality |
11 |
Features |
int, float, positive |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.
Returns
- bunchBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of the dataset.
(data, target) : tuple if with_info
is False
“””
load_credit(with_info=False)
Load and return the Credit Approval prediction dataset (classification).
Classes |
2 |
Samples total |
690 |
Dimensionality |
15 |
Features |
int, float, positive |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.
Returns
- dataBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of adult csv dataset.
(data, target) : tuple if with_info
is False
load_magic(with_info=False)
Load and return the Magic Gamma Telescope dataset (classification).
Classes |
2 |
Samples per class |
[6688,12332] |
Samples total |
19020 |
Dimensionality |
10 |
Features |
float |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.
Returns
- dataBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of adult csv dataset.
(data, target) : tuple if with_info
is False
load_haberman(with_info=False)
Load and return the Haberman’s Survival Data Set (classification).
Classes |
2 |
Samples per class |
[225, 82] |
Samples total |
306 |
Dimensionality |
3 |
Features |
int |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.
Returns
- dataBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of haberman csv dataset.
(data, target) : tuple if with_info
is False
load_mammographic(with_info=False)
Load and return the Mammographic Mass Data Set (classification).
Classes |
2 |
Samples per class |
[516, 445] |
Samples total |
961 |
Dimensionality |
5 |
Features |
int |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.
Returns
- dataBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of mammographic csv dataset.
(data, target) : tuple if with_info
is True
load_indian_liver(with_info=False)
Load and return the Indian Liver Patient Data Set (classification).
Classes |
2 |
Samples per class |
[416, 167] |
Samples total |
583 |
Dimensionality |
10 |
Features |
int, float |
Missing Values |
4 (nan) |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.
Returns
- dataBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of indian liver csv dataset.
(data, target) : tuple if with_info
is False
load_mnist_features_resnet18(with_info=False, split=False)
Load and return the MNIST Data Set features extracted using a pretrained ResNet18 neural network (classification).
Classes |
2 |
Samples per class Train |
[5923,6742,5958,6131,5842,5421,5918,6265,5851,5949] |
Samples per class Test |
[980,1135,1032,1010,982,892,958,1028,974,1009] |
Samples total Train |
60000 |
Samples total Test |
10000 |
Samples total |
70000 |
Dimensionality |
512 |
Features |
float |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.- splitboolean, default=False.
If True, returns a dictionary instead of an array in the place of the data.
Returns
- bunchBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of MNIST ResNet18 features csv dataset. If
split=False
, data is an array. Ifsplit=True
data is a dictionary with ‘train’ and ‘test’ splits.- (data, target)tuple if
with_info
is False. Ifsplit=False
, data is an array. If
split=True
data is a dictionary with ‘train’ and ‘test’ splits.
load_catsvsdogs_features_resnet18(with_info=False)
Load and return the Cats vs Dogs Data Set features extracted using a pretrained ResNet18 neural network (classification).
Classes |
2 |
Samples per class |
[11658,11604] |
Samples total |
23262 |
Dimensionality |
512 |
Features |
float |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.
Returns
- bunchBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of Cats vs Dogs ResNet18 features csv dataset.
(data, target) : tuple if with_info
is False
load_yearbook_features_resnet18(with_info=False, with_attributes=False)
Load and return the Yearbook Data Set features extracted using a pretrained ResNet18 neural network (classification).
Classes |
2 |
Samples per class |
[20248,17673] |
Samples total |
37921 |
Dimensionality |
512 |
Features |
float |
Parameters
- with_infoboolean, default=False.
If True, returns
(data, target)
instead of a Bunch object. See below for more information about thedata
andtarget
object.- with_attributesboolean, default=False.
If True, returns an additional dictionary containing information of additional attributes: year, state, city, school of the portraits. The key ‘attr_labels’ in the dictionary contains these labels corresponding to each columns, while ‘attr_data’ corresponds to the attribute data in form of numpy array.
Returns
- bunchBunch
Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification targets, ‘DESCR’, the full description of the dataset, and ‘filename’, the physical location of Yearbook ResNet18 features csv dataset.
(data, target) : tuple if with_info
is False