MRCpy.phi.ThresholdPhi

class MRCpy.phi.ThresholdPhi(n_classes, fit_intercept=True, n_thresholds=200, one_hot=False)[source]

Threshold features

A threshold feature is a function, \(f(x_d,t)=1\) when \(x_d<t\) and 0 otherwise, for a given x in dimension d and threshold t in that dimension. A product of threshold features is an indicator of a region and its expectation is closely related to cumulative distributions. This class obtains the thresholds fitting multiple one-dimensional decision stumps on the training data.

See also

For more information about MRC, one can refer to the following resources:

Parameters:
n_classesint

Number of classes in the dataset

fit_interceptbool, default = True

Whether to calculate the intercept. If set to false, no intercept will be used in calculations (i.e. data is expected to be already centered).

one_hotbool, default = False

Controls the method used for evaluating the features of the given instances in the binary case. Only applies in the binary case, namely, only when there are two classes. If set to true, one-hot-encoding will be used. If set to false a more efficient shorcut will be performed.

n_thresholdsint, default = 200

Maximum number of allowed threshold values for each dimension.

Attributes:
self.thrsValarray-like of shape (n_thresholds)

Threshold values learned from the training data.

self.thrsDimarray-like of shape (n_thresholds)

Dimensions corresponding to the learned threshold value in self.thrsVal.

is_fitted_bool

Whether the feature mappings has learned its hyperparameters (if any) and the length of the feature mapping is set.

len_int

Length of the feature mapping vector.

Methods

d_tree_split(X, Y[, n_thresholds])

Learn the univariate thresholds by using the split points of decision trees for each dimension of data.

est_exp(X_transform, Y)

Computes the average value of \(\Phi(x,y)\) to estimate \(\boldsymbol{\tau}\) that defines the constraint of the uncertainty set of distribution.

est_std(X_transform, Y, tau_mat)

Standard deviation of \(\Phi(x,y)\) that accounts for inaccuracies in the mean estimate \(\boldsymbol{\tau}\).

eval_x(X)

Evaluates the one-hot encoded features of the given instances i.e., X, \(\Phi(x,y)\), x \(\in\) X and all the labels.

eval_xy(X, Y)

Evaluates the one-hot encoded features of the given instances i.e., X, \(\Phi(x,y)\), x \(\in\) X and y \(\in\) Y.

fit(X[, Y])

Learns the set of thresholds using one-dimensional decision stumps obtained from the dataset.

transform(X)

Compute the threshold features (0/1) by comparing with the thresholds obtained in each dimension.

__init__(n_classes, fit_intercept=True, n_thresholds=200, one_hot=False)[source]

Initialize self. See help(type(self)) for accurate signature.

d_tree_split(X, Y, n_thresholds=None)[source]

Learn the univariate thresholds by using the split points of decision trees for each dimension of data.

Parameters:
Xarray-like of shape (n_samples, n_dimensions)

Unlabeled instances.

Yarray-like of shape (n_samples,)

Labels corresponding to the instances.

n_thresholdsint, default = None

Maximum limit on the number of thresholds obtained

Returns:
prodThrsDimarray-like of shape (n_thresholds)

Dimension in which the thresholds are defined.

prodThrsValarray-like of shape (n_thresholds)

Threshold value in the corresponding dimension.

est_exp(X_transform, Y)

Computes the average value of \(\Phi(x,y)\) to estimate \(\boldsymbol{\tau}\) that defines the constraint of the uncertainty set of distribution.

Parameters:
Xarray-like of shape (n_samples, n_features)

Features corresponding with the training instances \(\psi(x)\).

Yarray-like of shape (n_samples,)

Labels corresponding to the unlabeled training instances

Returns:
tau_array-like of shape (n_classes, n_features) or (1, n_features)

Empirical mean of \(\Phi(x,y)\).

est_std(X_transform, Y, tau_mat)

Standard deviation of \(\Phi(x,y)\) that accounts for inaccuracies in the mean estimate \(\boldsymbol{\tau}\). It is used to estimate \(\boldsymbol{\lambda}\) defining the uncertainty set constraints.

Parameters:
Xarray-like of shape (n_samples, n_features)

Features corresponding with the training instances \(\psi(x)\).

Yarray-like of shape (n_samples,)

Labels corresponding to the unlabeled training instances

Returns:
lambda_array-like of shape (n_classes, n_features) or (1, n_features)

Standard deviation of \(\Phi(x,y)\).

eval_x(X)

Evaluates the one-hot encoded features of the given instances i.e., X, \(\Phi(x,y)\), x \(\in\) X and all the labels. The output is 3D matrix that is composed of 2D matrices corresponding to each of the instance. These 2D matrices are the one-hot encodings of the instances’ features corresponding to all the possible labels in the data.

Parameters:
Xarray-like of shape (n_samples, n_dimensions)

Unlabeled training instances for developing the feature matrix.

Returns:
phiarray-like of shape (n_samples, n_classes, n_features * n_classes)

Matrix containing the one-hot encoding for all the classes for each of the instances given.

eval_xy(X, Y)

Evaluates the one-hot encoded features of the given instances i.e., X, \(\Phi(x,y)\), x \(\in\) X and y \(\in\) Y. The encodings are calculated, corresponding to the given labels, which is used by the learning stage for estimating the expectation of \(\Phi(x,y)\).

Parameters:
Xarray-like of shape (n_samples, n_dimensions)

Unlabeled training instances for developing the feature matrix

Yarray-like of shape (n_samples)

Labels corresponding to the unlabeled training instances

Returns:
phiarray-like of shape (n_samples, n_features * n_classes)

Matrix containing the one-hot encoding with respect to the labels given for all the instances.

fit(X, Y=None)[source]

Learns the set of thresholds using one-dimensional decision stumps obtained from the dataset.

Parameters:
Xarray-like of shape (n_samples, n_dimensions)

Unlabeled training instances used to learn the feature configurations.

Yarray-like of shape (n_samples,), default = None

Labels corresponding to the unlabeled instances X, used for finding the thresholds from the dataset.

Returns:
self :

Fitted estimator

transform(X)[source]

Compute the threshold features (0/1) by comparing with the thresholds obtained in each dimension.

Parameters:
Xarray-like of shape (n_samples, n_dimensions)

Unlabeled training instances.

Returns:
X_featarray-like of shape (n_samples, n_features)

Transformed features from the given instances.