Matern kernel.
The class of Matern kernels is a generalization of the :class:`RBF`. It has an additional parameter :math:`\nu` which controls the smoothness of the resulting function. The smaller :math:`\nu`, the less smooth the approximated function is. As :math:`\nu\rightarrow\infty`, the kernel becomes equivalent to the :class:`RBF` kernel. When :math:`\nu = 1/2`, the Matérn kernel becomes identical to the absolute exponential kernel. Important intermediate values are :math:`\nu=1.5` (once differentiable functions) and :math:`\nu=2.5` (twice differentiable functions).
The kernel is given by:
.. math:: k(x_i, x_j) = \frac
\Gamma(\nu)2^{\nu-1
}
\Bigg( \frac\sqrt{2\nu
}
l
d(x_i , x_j ) \Bigg)^\nu K_\nu\Bigg( \frac\sqrt{2\nu
}
l
d(x_i , x_j )\Bigg)
where :math:`d(\cdot,\cdot)` is the Euclidean distance, :math:`K_\nu
(\cdot)` is a modified Bessel function and :math:`\Gamma(\cdot)` is the gamma function. See 1
_, Chapter 4, Section 4.2, for details regarding the different variants of the Matern kernel.
Read more in the :ref:`User Guide <gp_kernels>`.
.. versionadded:: 0.18
Parameters ---------- length_scale : float or ndarray of shape (n_features,), default=1.0 The length scale of the kernel. If a float, an isotropic kernel is used. If an array, an anisotropic kernel is used where each dimension of l defines the length-scale of the respective feature dimension.
length_scale_bounds : pair of floats >= 0 or 'fixed', default=(1e-5, 1e5) The lower and upper bound on 'length_scale'. If set to 'fixed', 'length_scale' cannot be changed during hyperparameter tuning.
nu : float, default=1.5 The parameter nu controlling the smoothness of the learned function. The smaller nu, the less smooth the approximated function is. For nu=inf, the kernel becomes equivalent to the RBF kernel and for nu=0.5 to the absolute exponential kernel. Important intermediate values are nu=1.5 (once differentiable functions) and nu=2.5 (twice differentiable functions). Note that values of nu not in 0.5, 1.5, 2.5, inf
incur a considerably higher computational cost (appr. 10 times higher) since they require to evaluate the modified Bessel function. Furthermore, in contrast to l, nu is kept fixed to its initial value and not optimized.
References ---------- .. 1
`Carl Edward Rasmussen, Christopher K. I. Williams (2006). 'Gaussian Processes for Machine Learning'. The MIT Press. <http://www.gaussianprocess.org/gpml/>`_
Examples -------- >>> from sklearn.datasets import load_iris >>> from sklearn.gaussian_process import GaussianProcessClassifier >>> from sklearn.gaussian_process.kernels import Matern >>> X, y = load_iris(return_X_y=True) >>> kernel = 1.0 * Matern(length_scale=1.0, nu=1.5) >>> gpc = GaussianProcessClassifier(kernel=kernel, ... random_state=0).fit(X, y) >>> gpc.score(X, y) 0.9866... >>> gpc.predict_proba(X:2,:
) array([0.8513..., 0.0368..., 0.1117...],
[0.8086..., 0.0693..., 0.1220...]
)