conformal.nonconformity

Nonconformity module contains nonconformity scores for classification and regression.

Structure:

class conformal.nonconformity.ClassNC[source]

Bases: object

Base class for classification nonconformity scores.

Extending classes should implement fit() and nonconformity() methods.

fit(data)[source]

Process the data used for later calculation of nonconformities.

Parameters:data (Table) – Data set.
nonconformity(instance)[source]

Compute the nonconformity score of the given instance.

class conformal.nonconformity.ClassModelNC(classifier)[source]

Bases: conformal.nonconformity.ClassNC

Base class for classification nonconformity scores that are based on an underlying classifier.

Extending classes should implement ClassNC.nonconformity() method.

learner

Untrained underlying classifier.

model

Trained underlying classifier.

__init__(classifier)[source]

Store the provided classifier as learner.

fit(data)[source]

Train the underlying classifier on provided data and store the trained model.

class conformal.nonconformity.InverseProbability(classifier)[source]

Bases: conformal.nonconformity.ClassModelNC

Inverse probability nonconformity score returns \(1 - p\), where \(p\) is the probability assigned to the actual class by the underlying classification model (ClassModelNC.model).

Examples

>>> train, test = next(LOOSampler(Table('iris')))
>>> tp = TransductiveClassifier(InverseProbability(NaiveBayesLearner()), train)
>>> print(tp(test[0], 0.1))
nonconformity(instance)[source]

Compute the nonconformity score of the given instance.

class conformal.nonconformity.ProbabilityMargin(classifier)[source]

Bases: conformal.nonconformity.ClassModelNC

Probability margin nonconformity score measures the difference \(d_p\) between the predicted probability of the actual class and the largest probability corresponding to some other class. To put the values on scale from 0 to 1, the nonconformity function returns \((1 - d_p) / 2\).

Examples

>>> train, test = next(LOOSampler(Table('iris')))
>>> tp = TransductiveClassifier(ProbabilityMargin(LogisticRegressionLearner()), train)
>>> print(tp(test[0], 0.1))
nonconformity(instance)[source]

Compute the nonconformity score of the given instance.

class conformal.nonconformity.SVMDistance(classifier)[source]

Bases: conformal.nonconformity.ClassNC

SVMDistance nonconformity score measures the distance from the SVM’s decision boundary. The score depends on the distance and the side of the decision boundary that the example lies on. Examples that lie on the correct side of the decision boundary and would therefore result in a correct prediction using the SVM classifier have a nonconformity score less than 1, while the incorrectly predicted examples have a score more than 1.

\[\begin{split}\mathit{nc} = \begin{cases} \frac{1}{1+d} & \text{correct}\\ 1+d &\text{incorrect} \end{cases}\end{split}\]

The provided SVM classifier must be a sklearn’s SVM classifier (SVC, LinearSVC, NuSVC) providing the decision_function() which computes the distance to decision boundary. This nonconformity works only for binary classification problems.

Examples

>>> from sklearn.svm import SVC
>>> train, test = next(LOOSampler(Table('titanic')))
>>> train, calibrate = next(RandomSampler(train, 2, 1))
>>> icp = InductiveClassifier(SVMDistance(SVC()), train, calibrate)
>>> print(icp(test[0], 0.1))
__init__(classifier)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(data)[source]

Process the data used for later calculation of nonconformities.

Parameters:data (Table) – Data set.
nonconformity(instance)[source]

Compute the nonconformity score of the given instance.

class conformal.nonconformity.NearestNeighbours(distance=<MagicMock name='mock()' id='140718001738752'>, k=1)[source]

Bases: object

Base class for nonconformity measures based on nearest neighbours.

distance

Distance measure.

k

Number of nearest neighbours.

Type:int
__init__(distance=<MagicMock name='mock()' id='140718001738752'>, k=1)[source]

Store the distance measure and the number of neighbours.

fit(data)[source]

Store the data for finding nearest neighbours.

neighbours(instance)[source]

Compute distances to all other data instances using the distance measure (distance).

Excludes data instances that are equal to the provided instance.

Returns:List of pairs (distance, instance) in increasing order of distances.
class conformal.nonconformity.ClassNearestNeighboursNC(distance=<MagicMock name='mock()' id='140718001738752'>, k=1)[source]

Bases: conformal.nonconformity.NearestNeighbours, conformal.nonconformity.ClassNC

Base class for nearest neighbrours based classification nonconformity scores.

class conformal.nonconformity.KNNDistance(distance=<MagicMock name='mock()' id='140718001738752'>, k=1)[source]

Bases: conformal.nonconformity.ClassNearestNeighboursNC

Computes the sum of distances to k nearest neighbours of the same class as the given instance and the sum of distances to k nearest neighbours of other classes. Returns their ratio.

Examples

>>> from Orange.distance import Euclidean
>>> train, test = next(LOOSampler(Table('iris')))
>>> cp = CrossClassifier(KNNDistance(Euclidean(), 10), 2, train)
>>> print(cp(test[0], 0.1))
nonconformity(instance)[source]

Compute the nonconformity score of the given instance.

class conformal.nonconformity.KNNFraction(distance=<MagicMock name='mock()' id='140718001738752'>, k=1, weighted=False)[source]

Bases: conformal.nonconformity.ClassNearestNeighboursNC

Computes the k nearest neighbours of the given instance. Returns the fraction of instances of the same class as the given instance within its k nearest neighbours.

Weighted version uses weights \(1/d_i\) based on distances instead of simply counting the instances. Non-weighted version is equivalent to using a value 1 for all weights.

Examples

>>> train, test = next(LOOSampler(Table('iris')))
>>> cp = CrossClassifier(KNNFraction(Euclidean(), 10, weighted=True), 2, train)
>>> print(cp(test[0], 0.1))
__init__(distance=<MagicMock name='mock()' id='140718001738752'>, k=1, weighted=False)[source]

Store the distance measure and the number of neighbours.

nonconformity(instance)[source]

Compute the nonconformity score of the given instance.

class conformal.nonconformity.LOOClassNC(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, relative=True, include=False, neighbourhood='fixed')[source]

Bases: conformal.nonconformity.NearestNeighbours, conformal.nonconformity.ClassNC

\[\mathit{nc} = \mathit{error} + (1 - p) \quad \text{or} \quad \mathit{nc} = \frac{1 - p}{\mathit{error}}\]

\(p\) … probability of actual class predicted from \(N_k(z^*)\) - k nearest neighbours of the instance \(z^*\)

The first nonconformity score is used when the parameter relative is set to False and the second one when it is set to True.

\[\mathit{error} = \frac {\sum_{z_i \in N_k(z^*)} w_i (1 - p_i)} {\sum_{z_i \in N_k(z^*)} w_i}, \quad w_i = \frac{1}{d(x^*, x_i)}\]

\(p_i\) … probability of actual class predicted from \(N_k(z') \setminus z_i\) or \(N_k(z') \setminus z_i \cup z^*\) if the parameter include is set to True. \(z'\) is \(z^*\) if the neighbourhood parameter is ‘fixed’ and \(z_i\) if it’s ‘variable’.

__init__(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, relative=True, include=False, neighbourhood='fixed')[source]

Initialize the parameters.

fit(data)[source]

Store the data for finding nearest neighbours and initialize cache.

get_neighbourhood(inst)[source]

Construct an Orange data Table consisting of instance’s k nearest neighbours. Cache the results for later calls with the same instance.

error(inst, neighbours)[source]

Compute the average weighted probability prediction error for predicting the actual class of each neighbour from the other ones. Include the new example among the neighbours if the parameter include is True.

nonconformity(inst)[source]

Compute the nonconformity score of the given instance.

class conformal.nonconformity.RegrNC[source]

Bases: object

Base class for regression nonconformity scores.

Extending classes should implement fit(), nonconformity() and predict() methods.

fit(data)[source]

Process the data used for later calculation of nonconformities.

Parameters:data (Table) – Data set.
nonconformity(instance)[source]

Compute the nonconformity score of the given instance.

predict(inst, nc)[source]

Compute the inverse of the nonconformity score. Determine a range of values for which the nonconformity of the given instance does not exceed nc.

class conformal.nonconformity.RegrModelNC(classifier)[source]

Bases: conformal.nonconformity.RegrNC

Base class for regression nonconformity scores that are based on an underlying classifier.

Extending classes should implement RegrNC.nonconformity() and RegrNC.predict() methods.

learner

Untrained underlying classifier.

model

Trained underlying classifier.

__init__(classifier)[source]

Store the provided classifier as learner.

fit(data)[source]

Train the underlying classifier on provided data and store the trained model.

class conformal.nonconformity.AbsError(classifier)[source]

Bases: conformal.nonconformity.RegrModelNC

Absolute error nonconformity score returns the absolute difference between the predicted value (\(\hat{y}\)) by the underlying RegrModelNC.model and the actual value (\(y^{*}\)).

\[\mathit{nc} = |\hat{y}-y^{*}|\]

Examples

>>> train, test = next(LOOSampler(Table('housing')))
>>> cr = CrossRegressor(AbsError(LinearRegressionLearner()), 2, train)
>>> print(cr(test[0], 0.1))
nonconformity(instance)[source]

Compute the nonconformity score of the given instance.

predict(inst, nc)[source]

Compute the inverse of the nonconformity score. Determine a range of values for which the nonconformity of the given instance does not exceed nc.

class conformal.nonconformity.AbsErrorRF(classifier, rf, beta=0.5)[source]

Bases: conformal.nonconformity.RegrModelNC

AbsErrorRF is based on an underlying regressor and a random forest. The prediction errors of regressor are used as nonconformity scores and are normalized by the standard deviation of predictions coming from individual trees in the forest.

\[\mathit{nc} = \frac{|\hat{y}-y^{*}|}{\sigma_\mathit{RF} + \beta}\]

Examples

>>> from sklearn.ensemble import RandomForestRegressor
>>> icr = InductiveRegressor(AbsErrorRF(RandomForestRegressionLearner(), RandomForestRegressor()))
>>> r = run(icr, 0.1, CrossSampler(Table('housing'), 10))
>>> print(r.accuracy(), r.median_range(), r.interdecile_mean())
__init__(classifier, rf, beta=0.5)[source]

Store the classifier and beta parameter.

fit(data)[source]

Train the underlying classifier on provided data and store the trained model.

norm(inst)[source]

Normalization factor is equal to the standard deviation of predictions from trees in a random forest plus a constant term beta.

nonconformity(inst)[source]

Compute the nonconformity score of the given instance.

predict(inst, nc)[source]

Compute the inverse of the nonconformity score. Determine a range of values for which the nonconformity of the given instance does not exceed nc.

class conformal.nonconformity.ErrorModelNC(classifier, error_classifier, beta=0.5, loo=False)[source]

Bases: conformal.nonconformity.RegrModelNC

ErrorModelNC is based on two underlying regressors. The first one is trained to predict the value while the second one is used for predicting logarithms of the errors made by the first one.

H. Papadopoulos and H. Haralambous. Reliable prediction intervals with regression neural networks. Neural Networks (2011).

\[\mathit{nc} = \frac{|\hat{y}-y^{*}|}{\exp(\mu)-1 + \beta}\]

\(\mu\) … prediction for the value of \(\log(|\hat{y}-y^{*}|+1)\) returned by the second regressor

Parameter loo determines whether to use a leave-one-out schema for building the training set of errors for the second regressor or not.

Examples

>>> nc = ErrorModelNC(SVRLearner(), LinearRegressionLearner())
>>> icr = InductiveRegressor(nc)
>>> r = run(icr, 0.1, CrossSampler(Table('housing'), 10))
>>> print(r.accuracy(), r.median_range(), r.interdecile_mean())
__init__(classifier, error_classifier, beta=0.5, loo=False)[source]

Store the provided classifier as learner.

fit(data)[source]

Train the underlying classifier on provided data and store the trained model.

nonconformity(inst)[source]

Compute the nonconformity score of the given instance.

predict(inst, nc)[source]

Compute the inverse of the nonconformity score. Determine a range of values for which the nonconformity of the given instance does not exceed nc.

class conformal.nonconformity.ExperimentalNC(rf)[source]

Bases: conformal.nonconformity.RegrModelNC

__init__(rf)[source]

Store the provided classifier as learner.

fit(data)[source]

Train the underlying classifier on provided data and store the trained model.

norm(inst)[source]
nonconformity(inst)[source]

Compute the nonconformity score of the given instance.

predict(inst, nc)[source]

Compute the inverse of the nonconformity score. Determine a range of values for which the nonconformity of the given instance does not exceed nc.

class conformal.nonconformity.AbsErrorNormalized(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, gamma=0.5, rho=0.5, exp=True, rf=None)[source]

Bases: conformal.nonconformity.RegrModelNC, conformal.nonconformity.NearestNeighbours

Normalized absolute error prediction uses an underlying regression model to predict the value, which is then normalized by the distance and variance of the nearest neighbours.

H. Papadopoulos, V. Vovk and A. Gammerman. Regression Conformal Prediction with Nearest Neighbours. Journal of Artificial Intelligence Research (2011).

\[\mathit{nc} = \frac{|\hat{y}-y^{*}|}{\exp(\gamma \lambda^*) + \exp(\rho \xi^*)} \quad \text{or} \quad \mathit{nc} = \frac{|\hat{y}-y^{*}|}{\gamma + \lambda^* + \xi^*}\]

The first nonconformity score is used when the parameter exp is set to True and the second one when it is set to False.

\[\lambda^* = \frac{d_k(z^*)}{\mathit{median}(\{d_k(z), z \in T\})}, \quad d_k(z) = \sum_{z_i \in N_k(z)} distance(x, x_i)\]
\[\xi^* = \frac{\sigma_k(z^*)}{\mathit{median}(\{\sigma_k(z), z \in T\})}, \quad \sigma_k(z) = \sqrt{\frac{1}{k} \sum_{z_i \in N_k(z)}(y_i-\bar{y})}\]

Parameter rf enables the use of a random forest for computing the standard deviation of predictions instead of the nearest neighbours.

__init__(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, gamma=0.5, rho=0.5, exp=True, rf=None)[source]

Initialize the parameters.

fit(data)[source]

Train the underlying model and precompute medians for nonconformity scores.

_d(inst)[source]

Sum of distances to nearest neighbours.

_lambda(inst)[source]

Normalized distance measure.

_sigma(inst)[source]

Standard deviation of y values. This comes either from the nearest neighbours or from the predictions of individual trees in a random forest if the rf is provided.

_xi(inst)[source]

Normalized variance measure.

norm(inst)[source]

Compute the normalization factor.

nonconformity(inst)[source]

Compute the nonconformity score of the given instance.

predict(inst, nc)[source]

Compute the inverse of the nonconformity score. Determine a range of values for which the nonconformity of the given instance does not exceed nc.

class conformal.nonconformity.LOORegrNC(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, relative=True, include=False, neighbourhood='fixed')[source]

Bases: conformal.nonconformity.NearestNeighbours, conformal.nonconformity.RegrNC

\[\mathit{nc} = \mathit{error} + |\hat{y}-y^{*}| \quad \text{or} \quad \mathit{nc} = \frac{|\hat{y}-y^{*}|}{\mathit{error}}\]

\(\hat{y}\) … value predicted from \(N_k(z^*)\)

The first nonconformity score is used when the parameter relative is set to False and the second one when it is set to True.

\[\mathit{error} = \frac {\sum_{z_i \in N_k(z^*)} w_i |\hat{y_i}-y_i|} {\sum_{z_i \in N_k(z^*)} w_i}, \quad w_i = \frac{1}{d(x^*, x_i)}\]

\(\hat{y_i}\) … value predicted from \(N_k(z') \setminus z_i\) or \(N_k(z') \setminus z_i \cup z^*\) if the parameter include is set to True. \(z'\) is \(z^*\) if the neighbourhood parameter is ‘fixed’ and \(z_i\) if it’s ‘variable’.

__init__(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, relative=True, include=False, neighbourhood='fixed')[source]

Initialize the parameters.

fit(data)[source]

Store the data for finding nearest neighbours and initialize cache.

get_neighbourhood(inst)[source]

Construct an Orange data Table consisting of instance’s k nearest neighbours. Cache the results for later calls with the same instance.

error(inst, neighbours)[source]

Compute the average weighted error for predicting the value of each neighbour from the other ones. Include the new example among the neighbours if the parameter include is True.

nonconformity(inst)[source]

Compute the nonconformity score of the given instance.

predict(inst, nc)[source]

Compute the inverse of the nonconformity score. Determine a range of values for which the nonconformity of the given instance does not exceed nc.

class conformal.nonconformity.RegrNearestNeighboursNC(distance=<MagicMock name='mock()' id='140718001738752'>, k=1)[source]

Bases: conformal.nonconformity.NearestNeighbours, conformal.nonconformity.RegrNC

Base class for nearest neighbours based regression nonconformity scores.

class conformal.nonconformity.AbsErrorKNN(distance=<MagicMock name='mock()' id='140718001738752'>, k=10, average=False, variance=False)[source]

Bases: conformal.nonconformity.RegrNearestNeighboursNC

Absolute error of k nearest neighbours computes the average value of the k nearest neighbours and returns an absolute difference between this average (\(y_k\)) and the actual value (\(y^{*}\)).

\[\begin{split}\bar{y} &= 1/k \sum_{N_k(x^{*})} y_i \\ \mathit{nc} &= |\bar{y} - y^{*}|\end{split}\]

Weighted version can normalize by average and/or variance.

\[\mathit{nc} = \frac{ |\bar{y}-y^{*}| } { \bar{y} \cdot y_{\sigma} }\]
average

Normalize by average.

Type:bool
variance

Normalize by variance.

Type:bool

Examples

>>> train, test = next(LOOSampler(Table('housing')))
>>> cr = CrossRegressor(AbsErrorKNN(Euclidean(), 10, average=True), 2, train)
>>> print(cr(test[0], 0.1))
__init__(distance=<MagicMock name='mock()' id='140718001738752'>, k=10, average=False, variance=False)[source]

Initialize the distance measure, number of nearest neighbours to consider and whether to normalize by average and by variance.

stats(instance)[source]

Computes mean and standard deviation of values within the k nearest neighbours.

norm(avg, std)[source]

Compute the normalization factor according to the chosen properties.

nonconformity(instance)[source]

Compute the nonconformity score of the given instance.

predict(inst, nc)[source]

Compute the inverse of the nonconformity score. Determine a range of values for which the nonconformity of the given instance does not exceed nc.

class conformal.nonconformity.AvgErrorKNN(distance=<MagicMock name='mock()' id='140718001738752'>, k=1)[source]

Bases: conformal.nonconformity.RegrNearestNeighboursNC

Average error of k nearest neighbours computes the average absolute error of the actual value (\(y^{*}\)) compared to the k nearest neighbours (\(y_i\)).

\[\mathit{nc} = 1/k \sum_{N_k(x^{*})} |y^{*} - y_i|\]

Note

There might be no suitable y values for the required significance level at the time of prediction. In such cases, the predicted range is [nan, nan].

Examples

>>> train, test = next(LOOSampler(Table('housing')))
>>> cr = CrossRegressor(AvgErrorKNN(Euclidean(), 10), 2, train)
>>> print(cr(test[0], 0.1))
avg_abs(y, ys)[source]
avg_abs_inv(nc, ys)[source]
nonconformity(instance)[source]

Compute the nonconformity score of the given instance.

predict(inst, nc)[source]

Compute the inverse of the nonconformity score. Determine a range of values for which the nonconformity of the given instance does not exceed nc.