Getting started¶
Installation and Setup¶
Installation
The latest built version of MRCpy can be installed using pip as
`
pip install MRCpy
`
Alternatively, the latest code (under development) of MRCpy can be installed by downloading the source GitHub repository as
`
git clone https://github.com/MachineLearningBCAM/MRCpy.git
cd MRCpy
python3 setup.py install
`
You may then use pytest tests to run all the checks (you will need to have the pytest package installed).
Note
The solver based on cvxpy in the library uses the GUROBI optimizer which requires a license. You can get a free academic license from here.
Dependencies
Python>= 3.9numpy>= 1.19scipy>= 1.4.1scikit-learn>= 0.22cvxpy>= 1.1pandas>= 1.0pyarrowgurobipy(requires license)pycddlib>= 3.0.2 (required only for LMRC)
Note
Installing pycddlib requires the GMP library, and pip install pycddlib
alone may not be sufficient. See the
pycddlib installation guide
for details on installing GMP and other dependencies.
Optional (for PyTorch MGCE classifier):
torch>= 1.9.0tqdm>= 4.50.0
Quick start¶
This example loads the mammographic dataset, and trains the MRC classifier
using 0-1 loss (i.e., the default loss).
from MRCpy import MRC
from MRCpy.datasets import load_mammographic
from sklearn.model_selection import train_test_split
# Load the mammographic dataset
X, Y = load_mammographic(with_info=False)
# Split the data into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
# Create the MRC classifier using default loss (0-1)
clf = MRC()
# Fit the classifier on the training data
clf.fit(X_train, y_train)
# Bounds on the classification error (only for MRC)
lower_error = clf.get_lower_bound()
upper_error = clf.get_upper_bound()
# Compute the accuracy on the test set
accuracy = clf.score(X_test, y_test)
Dataset Loaders¶
MRCpy provides convenient loader functions for a variety of standard datasets via MRCpy.datasets. Note that the datasets adult, magic, mnist, cats vs dogs and yearbook are not available in the built version (installed through pip) of MRCpy.
Usage¶
All loaders follow the same interface. By default they return (X, y) arrays:
from MRCpy.datasets import load_mammographic
# Returns (X, y) arrays
X, y = load_mammographic(with_info=False)
# Returns a Bunch object with .data, .target, .DESCR, .filename
dataset = load_mammographic(with_info=True)
Available Datasets¶
Loader |
Classes |
Samples |
Features |
|---|---|---|---|
|
3 |
150 |
4 |
|
2 |
306 |
3 |
|
2 |
961 |
5 |
|
2 |
668 |
8 |
|
2 |
690 |
15 |
|
2 |
583 |
10 |
|
6 |
214 |
9 |
|
8 |
336 |
8 |
|
4 |
846 |
18 |
|
7 |
2310 |
19 |
|
10 |
6497 |
11 |
|
6 |
6435 |
36 |
|
10 |
5620 |
64 |
|
2 |
1500 |
99 |
|
2 |
48842 |
14 |
|
2 |
19020 |
10 |
|
7 |
581012 |
54 |
|
26 |
20000 |
16 |
Pre-extracted Feature Datasets¶
These datasets provide features extracted using a pretrained ResNet18 neural network. They are not available in the pip-installed version.
Loader |
Classes |
Samples |
Features |
|---|---|---|---|
|
10 |
70000 |
512 |
|
2 |
23262 |
512 |
|
2 |
37921 |
512 |
* Not available in the pip-installed version of MRCpy.
Newsgroup Text Datasets¶
Binary text classification datasets derived from the 20 Newsgroups corpus, with TF-IDF features (1000 dimensions).
Loader |
Classes |
Features |
|---|---|---|
|
2 |
1000 |
|
2 |
1000 |
|
2 |
1000 |
|
2 |
1000 |
|
2 |
1000 |