Minimax Framework ================= MRCpy [3]_ implements minimax risk classifiers (MRCs) that obtain classification rules by minimizing the worst-case expected loss over an uncertainty set of distributions. This section provides a general overview of the framework underlying all methods in the library, following the formulation in [4]_. Minimax Risk Classification --------------------------- Given a loss function :math:`\ell`, an MRC obtains a classification rule :math:`\mathrm{h}^{\mathcal{U}}` that solves: .. math:: \mathrm{h}^{\mathcal{U}} \in \arg\min_{\mathrm{h}} \max_{\mathrm{p} \in \mathcal{U}} \ell(\mathrm{h}, \mathrm{p}) where the maximization is over distributions :math:`\mathrm{p}` in an uncertainty set :math:`\mathcal{U}` defined by expectation constraints on feature mappings :math:`\Phi(x, y)`. Uncertainty Sets ---------------- MRCpy implements two types of uncertainty sets: * **Uncertainty set** :math:`\mathcal{U}_1`: Defined by constraints that bound the expectations of a feature mapping :math:`\Phi(x, y)`: .. math:: \mathcal{U}_1 = \left\{ \mathrm{p} : \left| \mathbb{E}_{\mathrm{p}}[\Phi(x,y)] - \boldsymbol{\tau} \right| \leq \boldsymbol{\lambda} \right\} MRCs using :math:`\mathcal{U}_1` (the ``MRC`` class) provide upper and lower bounds on the expected loss [1]_. * **Uncertainty set** :math:`\mathcal{U}_2`: Adds an additional constraint that fixes the instances' marginal distribution to coincide with the empirical marginal: .. math:: \mathcal{U}_2 = \left\{ \mathrm{p} : \mathrm{p}_x = \hat{\mathrm{p}}_x, \; \left| \mathbb{E}_{\mathrm{p}}[\Phi(x,y)] - \boldsymbol{\tau} \right| \leq \boldsymbol{\lambda} \right\} MRCs using :math:`\mathcal{U}_2` (the ``CMRC`` class) correspond to popular techniques such as L1-regularized logistic regression, zero-one adversarial classifiers, and maximum entropy machines [2]_. In both cases, :math:`\boldsymbol{\tau}` denotes the empirical mean estimates of the feature mappings and :math:`\boldsymbol{\lambda}` controls the size of the uncertainty set based on the estimation accuracy. Feature Mappings ---------------- MRCs use feature mappings to represent instance-label pairs as real vectors. MRCpy implements several feature mappings: * **Linear (identity)**: Direct use of input features. See :class:`~MRCpy.phi.BasePhi`. * **Random Fourier features**: Approximation of kernel methods via random projections. See :class:`~MRCpy.phi.RandomFourierPhi`. * **Random ReLU features**: Non-linear random features using ReLU activations. See :class:`~MRCpy.phi.RandomReLUPhi`. * **Threshold features**: Binary features based on thresholding input dimensions. See :class:`~MRCpy.phi.ThresholdPhi`. All feature mappings can be combined with any MRC variant. Loss Functions -------------- MRCpy supports two main types of loss functions: * **0-1 loss**: Directly quantifies the probability of classification error. Unlike common techniques that rely on surrogate losses, MRCs can utilize the 0-1 loss directly. * **Log-loss**: Quantifies the negative log-likelihood for a classification rule (cross-entropy). The library implements multiple techniques within the minimax risk classification framework; see the references below for details. References ---------- .. [1] Mazuelas, S., Romero, M., & Grunwald, P. (2023). Minimax Risk Classifiers with 0-1 Loss. Journal of Machine Learning Research, 24(208), 1-48. .. [2] Mazuelas, S., Shen, Y., & Pérez, A. (2022). Generalized Maximum Entropy for Supervised Classification. IEEE Transactions on Information Theory, 68(4), 2530-2550. .. [3] Bondugula, K., Álvarez, V., Segovia-Martín, J.I., Pérez, A., & Mazuelas, S. (2021). MRCpy: A Library for Minimax Risk Classifiers. arXiv preprint arXiv:2108.01952. .. [4] Mazuelas, S., Zanoni, A., & Pérez, A. (2020). Minimax Classification with 0-1 Loss and Performance Guarantees. Advances in Neural Information Processing Systems, 33, 302-312.