`MRCpy.phi`.RandomFourierPhi

class MRCpy.phi.RandomFourierPhi(n_classes, fit_intercept=True, sigma='scale', n_components=600, random_state=None, one_hot=False)[source]

Fourier features

Features obtained by approximating the rbf kernel by Random Fourier Feature map -

\[z(x) = \sqrt{2/D} * [\cos(w_1^t * x), ..., \cos(w_D^t * x), \sin(w_1^t * x), ..., \sin(w_D^t * x)]\]

where w is a vector(dimension d) of random weights from gaussian distribution with mean 0 and variance \(1/\sigma\) and D is the number of components in the resulting feature map. The parameter \(\sigma\) in the variance is similar to the scaling parameter of the radial basis function kernel:

\[K(x, x') = \exp{\frac{-\| x-x'\|^2}{2\sigma^2}}\]

Note that when using Random Fourier feature mapping, training and testing instances are expected to be normalized.

See also

For more information about Random Features check:

[1] Random Features: Rahimi, A., & Recht, B. (2007). Random Features for Large-Scale Kernel Machines. In NIPS (Vol. 3, No. 4, p. 5).

For more information about MRC, one can refer to the following resources:

[2] Mazuelas, S., Zanoni, A., & Pérez, A. (2020). Minimax Classification with 0-1 Loss and Performance Guarantees. Advances in Neural Information Processing Systems, 33, 302-312.

[3] Mazuelas, S., Shen, Y., & Pérez, A. (2020). Generalized Maximum Entropy for Supervised Classification. arXiv preprint arXiv:2007.05447.

[4] Bondugula, K., Mazuelas, S., & Pérez, A. (2021). MRCpy: A Library for Minimax Risk Classifiers. arXiv preprint arXiv:2108.01952.

Parameters:

n_classesint

Number of classes in the dataset.

fit_interceptbool, default = True

Whether to calculate the intercept. If set to false, no intercept will be used in calculations (i.e. data is expected to be already centered).

one_hotbool, default = False

Controls the method used for evaluating the features of the given instances in the binary case. Only applies in the binary case, namely, only when there are two classes. If set to true, one-hot-encoding will be used. If set to false a more efficient shorcut will be performed.

sigmastr or float, default = ‘scale’

When given a string, it defines the type of heuristic to be used to calculate the scaling parameter sigma using the data. For comparison its relation with parameter gamma used in other methods is \(\gamma=1/(2\sigma^2)\). When given a float, it is the value for the scaling parameter.

‘scale’: Approximates sigma by \(\sqrt{\frac{\textrm{n_features} * \textrm{var}(X)}{2}}\) so that gamma is \(\frac{1}{\textrm{n_features} * \textrm{var}(X)}\) where var is the variance function.
‘scale2’: Approximates sigma by \(\sqrt{\frac{\textrm{n_features}}{2}}\) so that gamma is \(\frac{1}{\textrm{n_features}}\) where var is the variance function.
‘avg_ann_50’: Approximates sigma by the average distance to the \(50^{\textrm{th}}\) nearest neighbour estimated from 1000 samples of the dataset using the function rff_sigma.

n_componentsint, default = 600

Number of features which the transformer transforms the input into.

random_stateint, RandomState instance, default = None

Random seed used to produce the random_weights_ used for the approximation of the gaussian kernel.

Attributes:

random_weights_array-like of shape (n_features, n_components/2): Random weights applied to the training samples as a step for computing the random Fourier features.
is_fitted_bool: Whether the feature mappings has learned its hyperparameters (if any) and the length of the feature mapping is set.
len_int: Length of the feature mapping vector.

Methods

`est_exp`(X, Y)	Average value of \(\phi(x,y)\) in the supervised dataset (X,Y).
`est_std`(X, Y)	Standard deviation of \(\phi(x,y)\) in the supervised dataset (X,Y).
`eval_x`(X)	Evaluates the one-hot encoded features of the given instances i.e., X, \(\phi(x,y)\), x \(\in\) X and all the labels.
`eval_xy`(X, Y)	Evaluates the one-hot encoded features of the given instances i.e., X, \(\phi(x,y)\), x \(\in\) X and y \(\in\) Y.
`fit`(X[, Y])	Learns the set of random weights for computing the features.
`rff_sigma`(X)	Computes the scaling parameter for the fourier features using the heuristic given in the paper "Compact Nonlinear Maps and Circulant Extensions" [1].
`transform`(X)	Compute the random Fourier features ((\(z(x)\))).

__init__(n_classes, fit_intercept=True, sigma='scale', n_components=600, random_state=None, one_hot=False)[source]

est_exp(X, Y)

Average value of \(\phi(x,y)\) in the supervised dataset (X,Y). Used in the learning stage to estimate the expectation of \(\phi(x,y)\), denoted by \({\tau}\)

Parameters:

Xarray-like of shape (n_samples, n_dimensions): Unlabeled training instances.
Yarray-like of shape (n_samples,): Labels corresponding to the unlabeled training instances

Returns:

tau_array-like of shape (n_features * n_classes): Average value of phi

est_std(X, Y)

Standard deviation of \(\phi(x,y)\) in the supervised dataset (X,Y). Used in the learning stage to estimate the variance in the expectation of \(\phi(x,y)\), denoted by \(\lambda\)

Parameters:

Xarray-like of shape (n_samples, n_dimensions): Unlabeled training instances.
Yarray-like of shape (n_samples,): Labels corresponding to the unlabeled training instances

Returns:

lambda_array-like of shape (n_features * n_classes): Standard deviation of phi

eval_x(X)

Evaluates the one-hot encoded features of the given instances i.e., X, \(\phi(x,y)\), x \(\in\) X and all the labels. The output is 3D matrix that is composed of 2D matrices corresponding to each of the instance. These 2D matrices are the one-hot encodings of the instances’ features corresponding to all the possible labels in the data.

Parameters:

Xarray-like of shape (n_samples, n_dimensions): Unlabeled training instances for developing the feature matrix.

Returns:

phiarray-like of shape: (n_samples, n_classes, n_features * n_classes)

Matrix containing the one-hot encoding for all the classes for each of the instances given.

eval_xy(X, Y)

Evaluates the one-hot encoded features of the given instances i.e., X, \(\phi(x,y)\), x \(\in\) X and y \(\in\) Y. The encodings are calculated, corresponding to the given labels, which is used by the learning stage for estimating the expectation of \(\phi(x,y)\).

Parameters:

Xarray-like of shape (n_samples, n_dimensions): Unlabeled training instances for developing the feature matrix
Yarray-like of shape (n_samples): Labels corresponding to the unlabeled training instances

Returns:

phiarray-like of shape: (n_samples, n_features * n_classes)

Matrix containing the one-hot encoding with respect to the labels given for all the instances.

fit(X, Y=None)[source]

Learns the set of random weights for computing the features. Also, compute the scaling parameter if the value is not given.

Parameters:

Xarray-like of shape (n_samples, n_dimensions): Unlabeled training instances used to learn the feature configurations.
Yarray-like of shape (n_samples,), default = None: This argument will never be used in this case. It is present in the signature for consistency in the signature of the function among different feature mappings.

Returns:

self: Fitted estimator

rff_sigma(X)[source]

Computes the scaling parameter for the fourier features using the heuristic given in the paper “Compact Nonlinear Maps and Circulant Extensions” [1].

The heuristic states that the scaling parameter is obtained as the average distance to the 50th nearest neighbour estimated from 1000 samples of the dataset.

MRCpy.phi.RandomFourierPhi

`MRCpy.phi`.RandomFourierPhi