Minimax Framework¶
MRCpy [3] implements minimax risk classifiers (MRCs) that obtain classification rules by minimizing the worst-case expected loss over an uncertainty set of distributions. This section provides a general overview of the framework underlying all methods in the library, following the formulation in [4].
Minimax Risk Classification¶
Given a loss function \(\ell\), an MRC obtains a classification rule \(\mathrm{h}^{\mathcal{U}}\) that solves:
where the maximization is over distributions \(\mathrm{p}\) in an uncertainty set \(\mathcal{U}\) defined by expectation constraints on feature mappings \(\Phi(x, y)\).
Uncertainty Sets¶
MRCpy implements two types of uncertainty sets:
Uncertainty set \(\mathcal{U}_1\): Defined by constraints that bound the expectations of a feature mapping \(\Phi(x, y)\):
\[\mathcal{U}_1 = \left\{ \mathrm{p} : \left| \mathbb{E}_{\mathrm{p}}[\Phi(x,y)] - \boldsymbol{\tau} \right| \leq \boldsymbol{\lambda} \right\}\]MRCs using \(\mathcal{U}_1\) (the
MRCclass) provide upper and lower bounds on the expected loss [1].Uncertainty set \(\mathcal{U}_2\): Adds an additional constraint that fixes the instances’ marginal distribution to coincide with the empirical marginal:
\[\mathcal{U}_2 = \left\{ \mathrm{p} : \mathrm{p}_x = \hat{\mathrm{p}}_x, \; \left| \mathbb{E}_{\mathrm{p}}[\Phi(x,y)] - \boldsymbol{\tau} \right| \leq \boldsymbol{\lambda} \right\}\]MRCs using \(\mathcal{U}_2\) (the
CMRCclass) correspond to popular techniques such as L1-regularized logistic regression, zero-one adversarial classifiers, and maximum entropy machines [2].
In both cases, \(\boldsymbol{\tau}\) denotes the empirical mean estimates of the feature mappings and \(\boldsymbol{\lambda}\) controls the size of the uncertainty set based on the estimation accuracy.
Feature Mappings¶
MRCs use feature mappings to represent instance-label pairs as real vectors. MRCpy implements several feature mappings:
Linear (identity): Direct use of input features. See
BasePhi.Random Fourier features: Approximation of kernel methods via random projections. See
RandomFourierPhi.Random ReLU features: Non-linear random features using ReLU activations. See
RandomReLUPhi.Threshold features: Binary features based on thresholding input dimensions. See
ThresholdPhi.
All feature mappings can be combined with any MRC variant.
Loss Functions¶
MRCpy supports two main types of loss functions:
0-1 loss: Directly quantifies the probability of classification error. Unlike common techniques that rely on surrogate losses, MRCs can utilize the 0-1 loss directly.
Log-loss: Quantifies the negative log-likelihood for a classification rule (cross-entropy).
The library implements multiple techniques within the minimax risk classification framework; see the references below for details.