###############
Getting started
###############

Installation and Setup
======================

**Installation**

The latest built version of ``MRCpy`` can be installed using `pip` as
```
pip install MRCpy
```

Alternatively, the latest code (under development) of MRCpy can be installed by downloading the source GitHub repository as 
```
git clone https://github.com/MachineLearningBCAM/MRCpy.git
cd MRCpy
python3 setup.py install
```
You may then use ``pytest tests`` to run all the checks (you will need to have the ``pytest`` package installed).

**Dependencies**

- `Python` :math:`\geq` 3.6
- `numpy` :math:`==` 1.25.0 
- `scipy`:math:`==` 1.10.0
- `scikit-learn` :math:`==` 1.2.2
- `cvxpy` :math:`==` 1.3.1
- `pandas` :math:`==` 2.2.0
- `mosek`
- `pyarrow`
- `gurobipy`

Quick start
===========

This example loads the mammographic dataset, and trains the `MRC` classifier
using 0-1 loss (i.e., the default loss).

::

    from MRCpy import MRC
    from MRCpy.datasets import load_mammographic
    from sklearn.model_selection import train_test_split

    # Load the mammographic dataset
    X, Y = load_mammographic(with_info=False)

    # Split the data into training and testing
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

    # Create the MRC classifier using default loss (0-1)
    clf = MRC()

    # Fit the classifier on the training data
    clf.fit(X_train, y_train)

    # Bounds on the classification error (only for MRC)
    lower_error = clf.get_lower_bound()
    upper_error = clf.get_upper_bound()

    # Compute the accuracy on the test set
    accuracy = clf.score(X_test, y_test)

Dataset Loaders
===============
`MRCpy <https://github.com/MachineLearningBCAM/MRCpy>`_ library incorporates a variety of datasets, along with descriptions and convenient loader functions for each dataset. Next, we show the description of the functions you can find and import from `MRCpy.datasets`. Note that the datasets ``adult``, ``magic``, ``mnist``, ``cats vs dogs`` and ``yearbook`` are not available in the built version (installed through pip) of ``MRCpy``.


``normalizeLabels(origY)``
--------------------------

Normalize the labels of the instances in the range 0,..., r-1 for r classes.


``load_adult(with_info=False)``
---------------------------------

Load and return the adult incomes prediction dataset (classification).

=================   ==============
Classes                          2
Samples per class    [37155,11687]
Samples total                48882
Dimensionality                  14
Features             int, positive
=================   ==============

**Parameters**

with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.

**Returns**

data : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of the dataset.

(data, target) : tuple if ``with_info`` is False


``load_diabetes(with_info=False)``
-----------------------------------
Load and return the Pima Indians Diabetes dataset (classification).

=================   =====================
Classes                                 2
Samples per class               [500,168]
Samples total                         668
Dimensionality                          8
Features             int, float, positive
=================   =====================

**Parameters**

with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.

**Returns**

data : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of the dataset.

(data, target) : tuple if ``with_info`` is False


``load_iris(with_info=False)``
-------------------------------
Load and return the Iris Plants Dataset (classification).

=================   =====================
Classes                                 3
Samples per class              [50,50,50]
Samples total                         150
Dimensionality                          4
Features             int, float, positive
=================   =====================

**Parameters**

with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.

**Returns**

data : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of the dataset.

(data, target) : tuple if ``with_info`` is False

``load_redwine(with_info=False)``
---------------------------------
"""Load and return the Red Wine Dataset (classification).

=================   =====================
Classes                                10
Samples per class            [1599, 4898]
Samples total                        6497
Dimensionality                         11
Features             int, float, positive
=================   =====================

Parameters
----------
with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.

Returns
-------
bunch : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of the dataset.

(data, target) : tuple if ``with_info`` is False

"""

``load_credit(with_info=False)``
---------------------------------
Load and return the Credit Approval prediction dataset (classification).

=================   =====================
Classes                                 2
Samples total                         690
Dimensionality                         15
Features             int, float, positive
=================   =====================

**Parameters**

with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.

**Returns**

data : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of adult csv dataset.

(data, target) : tuple if ``with_info`` is False


``load_magic(with_info=False)``
--------------------------------
Load and return the Magic Gamma Telescope dataset (classification).

=================== ======================
Classes                                 2
Samples per class            [6688,12332]
Samples total                       19020
Dimensionality                         10
Features                            float
=================== ======================

**Parameters**

with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.

**Returns**

data : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of adult csv dataset.

(data, target) : tuple if ``with_info`` is False


``load_haberman(with_info=False)``
-----------------------------------
Load and return the Haberman's Survival Data Set (classification).

====================  ==========
Classes                       2
Samples per class     [225, 82]
Samples total               306
Dimensionality                3
Features                    int
====================  ==========

**Parameters**

with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.

**Returns**

data : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of haberman csv dataset.

(data, target) : tuple if ``with_info`` is False


``load_mammographic(with_info=False)``
---------------------------------------
Load and return the Mammographic Mass Data Set (classification).

=================== ===========
Classes                      2
Samples per class    [516, 445]
Samples total              961
Dimensionality               5
Features                   int
=================== ===========

**Parameters**

with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.

**Returns**

data : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of mammographic csv dataset.

(data, target) : tuple if ``with_info`` is True


``load_indian_liver(with_info=False)``
---------------------------------------
Load and return the Indian Liver Patient Data Set
(classification).

========================== ===============================
Classes                                                 2
Samples per class                              [416, 167]
Samples total                                         583
Dimensionality                                         10
Features                                       int, float
Missing Values                                     4 (nan)
========================== ===============================

**Parameters**

with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.

**Returns**

data : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of indian liver csv dataset.

(data, target) : tuple if ``with_info`` is False

``load_mnist_features_resnet18(with_info=False, split=False)``
--------------------------------------------------------------
Load and return the MNIST Data Set features extracted using a
pretrained ResNet18 neural network (classification).

=======================  ===================================================
Classes                                                                    2
Samples per class Train  [5923,6742,5958,6131,5842,5421,5918,6265,5851,5949]
Samples per class Test        [980,1135,1032,1010,982,892,958,1028,974,1009]
Samples total Train                                                    60000
Samples total Test                                                     10000
Samples total                                                          70000
Dimensionality                                                           512
Features                                                               float
=======================  ===================================================

**Parameters**

with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.
split : boolean, default=False.
    If True, returns a dictionary instead of an array in the place of the
    data.

**Returns**

bunch : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of MNIST ResNet18 features
    csv dataset. If `split=False`, data is
    an array. If `split=True` data is a dictionary with 'train' and 'test'
    splits.

(data, target) : tuple if ``with_info`` is False. If `split=False`, data is
    an array. If `split=True` data is a dictionary with 'train' and 'test'
    splits.


``load_catsvsdogs_features_resnet18(with_info=False)``
------------------------------------------------------
Load and return the Cats vs Dogs Data Set features extracted using a
pretrained ResNet18 neural network (classification).

==================== =======================
Classes                                   2
Samples per class             [11658,11604]
Samples total                         23262
Dimensionality                          512
Features                              float
==================== =======================

**Parameters**

with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.

**Returns**

bunch : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of Cats vs Dogs ResNet18 features
    csv dataset.

(data, target) : tuple if ``with_info`` is False


``load_yearbook_features_resnet18(with_info=False, with_attributes=False)``
----------------------------------------------------
Load and return the Yearbook Data Set features extracted using a
pretrained ResNet18 neural network (classification).

==================== =======================
Classes                                   2
Samples per class             [20248,17673]
Samples total                         37921
Dimensionality                          512
Features                              float
==================== =======================

**Parameters**

with_info : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object.
    See below for more information about the `data` and `target` object.

with_attributes : boolean, default=False.
    If True, returns an additional dictionary containing information of
    additional attributes: year, state, city, school of the portraits.
    The key 'attr_labels' in the dictionary contains these labels
    corresponding to each columns, while 'attr_data' corresponds to
    the attribute data in form of numpy array.

**Returns**

bunch : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification targets,
    'DESCR', the full description of the dataset,
    and 'filename', the physical location of Yearbook ResNet18 features
    csv dataset.

(data, target) : tuple if ``with_info`` is False