conformal.nonconformity¶
Nonconformity module contains nonconformity scores for classification and regression.
Structure:
- ClassNC (classification scores)
- ClassModelNC (model based)
InverseProbability
,ProbabilityMargin
,SVMDistance
,LOOClassNC
- ClassNearestNeighboursNC (nearest neighbours based)
KNNDistance
,KNNFraction
- RegrNC (regression scores)
- RegrModelNC (model based)
AbsError
,AbsErrorRF
AbsErrorNormalized
,LOORegrNC
,ErrorModelNC
- RegrNearestNeighboursNC (nearest neighbours based)
AbsErrorKNN
,AvgErrorKNN
-
class
conformal.nonconformity.
ClassNC
[source]¶ Bases:
object
Base class for classification nonconformity scores.
Extending classes should implement
fit()
andnonconformity()
methods.
-
class
conformal.nonconformity.
ClassModelNC
(classifier)[source]¶ Bases:
conformal.nonconformity.ClassNC
Base class for classification nonconformity scores that are based on an underlying classifier.
Extending classes should implement
ClassNC.nonconformity()
method.-
learner
¶ Untrained underlying classifier.
-
model
¶ Trained underlying classifier.
-
-
class
conformal.nonconformity.
InverseProbability
(classifier)[source]¶ Bases:
conformal.nonconformity.ClassModelNC
Inverse probability nonconformity score returns \(1 - p\), where \(p\) is the probability assigned to the actual class by the underlying classification model (
ClassModelNC.model
).Examples
>>> train, test = next(LOOSampler(Table('iris'))) >>> tp = TransductiveClassifier(InverseProbability(NaiveBayesLearner()), train) >>> print(tp(test[0], 0.1))
-
class
conformal.nonconformity.
ProbabilityMargin
(classifier)[source]¶ Bases:
conformal.nonconformity.ClassModelNC
Probability margin nonconformity score measures the difference \(d_p\) between the predicted probability of the actual class and the largest probability corresponding to some other class. To put the values on scale from 0 to 1, the nonconformity function returns \((1 - d_p) / 2\).
Examples
>>> train, test = next(LOOSampler(Table('iris'))) >>> tp = TransductiveClassifier(ProbabilityMargin(LogisticRegressionLearner()), train) >>> print(tp(test[0], 0.1))
-
class
conformal.nonconformity.
SVMDistance
(classifier)[source]¶ Bases:
conformal.nonconformity.ClassNC
SVMDistance nonconformity score measures the distance from the SVM’s decision boundary. The score depends on the distance and the side of the decision boundary that the example lies on. Examples that lie on the correct side of the decision boundary and would therefore result in a correct prediction using the SVM classifier have a nonconformity score less than 1, while the incorrectly predicted examples have a score more than 1.
\[\begin{split}\mathit{nc} = \begin{cases} \frac{1}{1+d} & \text{correct}\\ 1+d &\text{incorrect} \end{cases}\end{split}\]The provided SVM classifier must be a sklearn’s SVM classifier (SVC, LinearSVC, NuSVC) providing the decision_function() which computes the distance to decision boundary. This nonconformity works only for binary classification problems.
Examples
>>> from sklearn.svm import SVC >>> train, test = next(LOOSampler(Table('titanic'))) >>> train, calibrate = next(RandomSampler(train, 2, 1)) >>> icp = InductiveClassifier(SVMDistance(SVC()), train, calibrate) >>> print(icp(test[0], 0.1))
-
class
conformal.nonconformity.
NearestNeighbours
(distance=<MagicMock name='mock()' id='140718001738752'>, k=1)[source]¶ Bases:
object
Base class for nonconformity measures based on nearest neighbours.
-
distance
¶ Distance measure.
-
k
¶ Number of nearest neighbours.
Type: int
-
-
class
conformal.nonconformity.
ClassNearestNeighboursNC
(distance=<MagicMock name='mock()' id='140718001738752'>, k=1)[source]¶ Bases:
conformal.nonconformity.NearestNeighbours
,conformal.nonconformity.ClassNC
Base class for nearest neighbrours based classification nonconformity scores.
-
class
conformal.nonconformity.
KNNDistance
(distance=<MagicMock name='mock()' id='140718001738752'>, k=1)[source]¶ Bases:
conformal.nonconformity.ClassNearestNeighboursNC
Computes the sum of distances to k nearest neighbours of the same class as the given instance and the sum of distances to k nearest neighbours of other classes. Returns their ratio.
Examples
>>> from Orange.distance import Euclidean >>> train, test = next(LOOSampler(Table('iris'))) >>> cp = CrossClassifier(KNNDistance(Euclidean(), 10), 2, train) >>> print(cp(test[0], 0.1))
-
class
conformal.nonconformity.
KNNFraction
(distance=<MagicMock name='mock()' id='140718001738752'>, k=1, weighted=False)[source]¶ Bases:
conformal.nonconformity.ClassNearestNeighboursNC
Computes the k nearest neighbours of the given instance. Returns the fraction of instances of the same class as the given instance within its k nearest neighbours.
Weighted version uses weights \(1/d_i\) based on distances instead of simply counting the instances. Non-weighted version is equivalent to using a value 1 for all weights.
Examples
>>> train, test = next(LOOSampler(Table('iris'))) >>> cp = CrossClassifier(KNNFraction(Euclidean(), 10, weighted=True), 2, train) >>> print(cp(test[0], 0.1))
-
class
conformal.nonconformity.
LOOClassNC
(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, relative=True, include=False, neighbourhood='fixed')[source]¶ Bases:
conformal.nonconformity.NearestNeighbours
,conformal.nonconformity.ClassNC
\[\mathit{nc} = \mathit{error} + (1 - p) \quad \text{or} \quad \mathit{nc} = \frac{1 - p}{\mathit{error}}\]\(p\) … probability of actual class predicted from \(N_k(z^*)\) - k nearest neighbours of the instance \(z^*\)
The first nonconformity score is used when the parameter
relative
is set to False and the second one when it is set to True.\[\mathit{error} = \frac {\sum_{z_i \in N_k(z^*)} w_i (1 - p_i)} {\sum_{z_i \in N_k(z^*)} w_i}, \quad w_i = \frac{1}{d(x^*, x_i)}\]\(p_i\) … probability of actual class predicted from \(N_k(z') \setminus z_i\) or \(N_k(z') \setminus z_i \cup z^*\) if the parameter
include
is set to True. \(z'\) is \(z^*\) if theneighbourhood
parameter is ‘fixed’ and \(z_i\) if it’s ‘variable’.-
__init__
(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, relative=True, include=False, neighbourhood='fixed')[source]¶ Initialize the parameters.
-
get_neighbourhood
(inst)[source]¶ Construct an Orange data Table consisting of instance’s k nearest neighbours. Cache the results for later calls with the same instance.
-
-
class
conformal.nonconformity.
RegrNC
[source]¶ Bases:
object
Base class for regression nonconformity scores.
Extending classes should implement
fit()
,nonconformity()
andpredict()
methods.
-
class
conformal.nonconformity.
RegrModelNC
(classifier)[source]¶ Bases:
conformal.nonconformity.RegrNC
Base class for regression nonconformity scores that are based on an underlying classifier.
Extending classes should implement
RegrNC.nonconformity()
andRegrNC.predict()
methods.-
learner
¶ Untrained underlying classifier.
-
model
¶ Trained underlying classifier.
-
-
class
conformal.nonconformity.
AbsError
(classifier)[source]¶ Bases:
conformal.nonconformity.RegrModelNC
Absolute error nonconformity score returns the absolute difference between the predicted value (\(\hat{y}\)) by the underlying
RegrModelNC.model
and the actual value (\(y^{*}\)).\[\mathit{nc} = |\hat{y}-y^{*}|\]Examples
>>> train, test = next(LOOSampler(Table('housing'))) >>> cr = CrossRegressor(AbsError(LinearRegressionLearner()), 2, train) >>> print(cr(test[0], 0.1))
-
class
conformal.nonconformity.
AbsErrorRF
(classifier, rf, beta=0.5)[source]¶ Bases:
conformal.nonconformity.RegrModelNC
AbsErrorRF is based on an underlying regressor and a random forest. The prediction errors of regressor are used as nonconformity scores and are normalized by the standard deviation of predictions coming from individual trees in the forest.
\[\mathit{nc} = \frac{|\hat{y}-y^{*}|}{\sigma_\mathit{RF} + \beta}\]Examples
>>> from sklearn.ensemble import RandomForestRegressor >>> icr = InductiveRegressor(AbsErrorRF(RandomForestRegressionLearner(), RandomForestRegressor())) >>> r = run(icr, 0.1, CrossSampler(Table('housing'), 10)) >>> print(r.accuracy(), r.median_range(), r.interdecile_mean())
-
class
conformal.nonconformity.
ErrorModelNC
(classifier, error_classifier, beta=0.5, loo=False)[source]¶ Bases:
conformal.nonconformity.RegrModelNC
ErrorModelNC is based on two underlying regressors. The first one is trained to predict the value while the second one is used for predicting logarithms of the errors made by the first one.
H. Papadopoulos and H. Haralambous. Reliable prediction intervals with regression neural networks. Neural Networks (2011).
\[\mathit{nc} = \frac{|\hat{y}-y^{*}|}{\exp(\mu)-1 + \beta}\]\(\mu\) … prediction for the value of \(\log(|\hat{y}-y^{*}|+1)\) returned by the second regressor
Parameter
loo
determines whether to use a leave-one-out schema for building the training set of errors for the second regressor or not.Examples
>>> nc = ErrorModelNC(SVRLearner(), LinearRegressionLearner()) >>> icr = InductiveRegressor(nc) >>> r = run(icr, 0.1, CrossSampler(Table('housing'), 10)) >>> print(r.accuracy(), r.median_range(), r.interdecile_mean())
-
class
conformal.nonconformity.
AbsErrorNormalized
(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, gamma=0.5, rho=0.5, exp=True, rf=None)[source]¶ Bases:
conformal.nonconformity.RegrModelNC
,conformal.nonconformity.NearestNeighbours
Normalized absolute error prediction uses an underlying regression model to predict the value, which is then normalized by the distance and variance of the nearest neighbours.
H. Papadopoulos, V. Vovk and A. Gammerman. Regression Conformal Prediction with Nearest Neighbours. Journal of Artificial Intelligence Research (2011).
\[\mathit{nc} = \frac{|\hat{y}-y^{*}|}{\exp(\gamma \lambda^*) + \exp(\rho \xi^*)} \quad \text{or} \quad \mathit{nc} = \frac{|\hat{y}-y^{*}|}{\gamma + \lambda^* + \xi^*}\]The first nonconformity score is used when the parameter
exp
is set to True and the second one when it is set to False.\[\lambda^* = \frac{d_k(z^*)}{\mathit{median}(\{d_k(z), z \in T\})}, \quad d_k(z) = \sum_{z_i \in N_k(z)} distance(x, x_i)\]\[\xi^* = \frac{\sigma_k(z^*)}{\mathit{median}(\{\sigma_k(z), z \in T\})}, \quad \sigma_k(z) = \sqrt{\frac{1}{k} \sum_{z_i \in N_k(z)}(y_i-\bar{y})}\]Parameter
rf
enables the use of a random forest for computing the standard deviation of predictions instead of the nearest neighbours.-
__init__
(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, gamma=0.5, rho=0.5, exp=True, rf=None)[source]¶ Initialize the parameters.
-
-
class
conformal.nonconformity.
LOORegrNC
(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, relative=True, include=False, neighbourhood='fixed')[source]¶ Bases:
conformal.nonconformity.NearestNeighbours
,conformal.nonconformity.RegrNC
\[\mathit{nc} = \mathit{error} + |\hat{y}-y^{*}| \quad \text{or} \quad \mathit{nc} = \frac{|\hat{y}-y^{*}|}{\mathit{error}}\]\(\hat{y}\) … value predicted from \(N_k(z^*)\)
The first nonconformity score is used when the parameter
relative
is set to False and the second one when it is set to True.\[\mathit{error} = \frac {\sum_{z_i \in N_k(z^*)} w_i |\hat{y_i}-y_i|} {\sum_{z_i \in N_k(z^*)} w_i}, \quad w_i = \frac{1}{d(x^*, x_i)}\]\(\hat{y_i}\) … value predicted from \(N_k(z') \setminus z_i\) or \(N_k(z') \setminus z_i \cup z^*\) if the parameter
include
is set to True. \(z'\) is \(z^*\) if theneighbourhood
parameter is ‘fixed’ and \(z_i\) if it’s ‘variable’.-
__init__
(classifier, distance=<MagicMock name='mock()' id='140718001738752'>, k=10, relative=True, include=False, neighbourhood='fixed')[source]¶ Initialize the parameters.
-
get_neighbourhood
(inst)[source]¶ Construct an Orange data Table consisting of instance’s k nearest neighbours. Cache the results for later calls with the same instance.
-
-
class
conformal.nonconformity.
RegrNearestNeighboursNC
(distance=<MagicMock name='mock()' id='140718001738752'>, k=1)[source]¶ Bases:
conformal.nonconformity.NearestNeighbours
,conformal.nonconformity.RegrNC
Base class for nearest neighbours based regression nonconformity scores.
-
class
conformal.nonconformity.
AbsErrorKNN
(distance=<MagicMock name='mock()' id='140718001738752'>, k=10, average=False, variance=False)[source]¶ Bases:
conformal.nonconformity.RegrNearestNeighboursNC
Absolute error of k nearest neighbours computes the average value of the k nearest neighbours and returns an absolute difference between this average (\(y_k\)) and the actual value (\(y^{*}\)).
\[\begin{split}\bar{y} &= 1/k \sum_{N_k(x^{*})} y_i \\ \mathit{nc} &= |\bar{y} - y^{*}|\end{split}\]Weighted version can normalize by average and/or variance.
\[\mathit{nc} = \frac{ |\bar{y}-y^{*}| } { \bar{y} \cdot y_{\sigma} }\]-
average
¶ Normalize by average.
Type: bool
-
variance
¶ Normalize by variance.
Type: bool
Examples
>>> train, test = next(LOOSampler(Table('housing'))) >>> cr = CrossRegressor(AbsErrorKNN(Euclidean(), 10, average=True), 2, train) >>> print(cr(test[0], 0.1))
-
__init__
(distance=<MagicMock name='mock()' id='140718001738752'>, k=10, average=False, variance=False)[source]¶ Initialize the distance measure, number of nearest neighbours to consider and whether to normalize by average and by variance.
-
-
class
conformal.nonconformity.
AvgErrorKNN
(distance=<MagicMock name='mock()' id='140718001738752'>, k=1)[source]¶ Bases:
conformal.nonconformity.RegrNearestNeighboursNC
Average error of k nearest neighbours computes the average absolute error of the actual value (\(y^{*}\)) compared to the k nearest neighbours (\(y_i\)).
\[\mathit{nc} = 1/k \sum_{N_k(x^{*})} |y^{*} - y_i|\]Note
There might be no suitable y values for the required significance level at the time of prediction. In such cases, the predicted range is [nan, nan].
Examples
>>> train, test = next(LOOSampler(Table('housing'))) >>> cr = CrossRegressor(AvgErrorKNN(Euclidean(), 10), 2, train) >>> print(cr(test[0], 0.1))