Minimax Framework

MRCpy [3] implements minimax risk classifiers (MRCs) that obtain classification rules by minimizing the worst-case expected loss over an uncertainty set of distributions. This section provides a general overview of the framework underlying all methods in the library, following the formulation in [4].

Minimax Risk Classification

Given a loss function \(\ell\), an MRC obtains a classification rule \(\mathrm{h}^{\mathcal{U}}\) that solves:

\[\mathrm{h}^{\mathcal{U}} \in \arg\min_{\mathrm{h}} \max_{\mathrm{p} \in \mathcal{U}} \ell(\mathrm{h}, \mathrm{p})\]

where the maximization is over distributions \(\mathrm{p}\) in an uncertainty set \(\mathcal{U}\) defined by expectation constraints on feature mappings \(\Phi(x, y)\).

Uncertainty Sets

MRCpy implements two types of uncertainty sets:

  • Uncertainty set \(\mathcal{U}_1\): Defined by constraints that bound the expectations of a feature mapping \(\Phi(x, y)\):

    \[\mathcal{U}_1 = \left\{ \mathrm{p} : \left| \mathbb{E}_{\mathrm{p}}[\Phi(x,y)] - \boldsymbol{\tau} \right| \leq \boldsymbol{\lambda} \right\}\]

    MRCs using \(\mathcal{U}_1\) (the MRC class) provide upper and lower bounds on the expected loss [1].

  • Uncertainty set \(\mathcal{U}_2\): Adds an additional constraint that fixes the instances’ marginal distribution to coincide with the empirical marginal:

    \[\mathcal{U}_2 = \left\{ \mathrm{p} : \mathrm{p}_x = \hat{\mathrm{p}}_x, \; \left| \mathbb{E}_{\mathrm{p}}[\Phi(x,y)] - \boldsymbol{\tau} \right| \leq \boldsymbol{\lambda} \right\}\]

    MRCs using \(\mathcal{U}_2\) (the CMRC class) correspond to popular techniques such as L1-regularized logistic regression, zero-one adversarial classifiers, and maximum entropy machines [2].

In both cases, \(\boldsymbol{\tau}\) denotes the empirical mean estimates of the feature mappings and \(\boldsymbol{\lambda}\) controls the size of the uncertainty set based on the estimation accuracy.

Feature Mappings

MRCs use feature mappings to represent instance-label pairs as real vectors. MRCpy implements several feature mappings:

  • Linear (identity): Direct use of input features. See BasePhi.

  • Random Fourier features: Approximation of kernel methods via random projections. See RandomFourierPhi.

  • Random ReLU features: Non-linear random features using ReLU activations. See RandomReLUPhi.

  • Threshold features: Binary features based on thresholding input dimensions. See ThresholdPhi.

All feature mappings can be combined with any MRC variant.

Loss Functions

MRCpy supports two main types of loss functions:

  • 0-1 loss: Directly quantifies the probability of classification error. Unlike common techniques that rely on surrogate losses, MRCs can utilize the 0-1 loss directly.

  • Log-loss: Quantifies the negative log-likelihood for a classification rule (cross-entropy).

The library implements multiple techniques within the minimax risk classification framework; see the references below for details.

References