Gaussian process regression (GPR).
The implementation is based on Algorithm 2.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams.
In addition to standard scikit-learn estimator API, GaussianProcessRegressor:
* allows prediction without prior fitting (based on the GP prior) * provides an additional method sample_y(X), which evaluates samples drawn from the GPR (prior or posterior) at given inputs * exposes a method log_marginal_likelihood(theta), which can be used externally for other ways of selecting hyperparameters, e.g., via Markov chain Monte Carlo.
Read more in the :ref:`User Guide <gaussian_process>`.
.. versionadded:: 0.18
Parameters ---------- kernel : kernel instance, default=None The kernel specifying the covariance function of the GP. If None is passed, the kernel '1.0 * RBF(1.0)' is used as default. Note that the kernel's hyperparameters are optimized during fitting.
alpha : float or array-like of shape (n_samples), default=1e-10 Value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations. This can also prevent a potential numerical issue during fitting, by ensuring that the calculated values form a positive definite matrix. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapoint-dependent noise level. Note that this is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify the noise level directly as a parameter is mainly for convenience and for consistency with Ridge.
optimizer : 'fmin_l_bfgs_b' or callable, default='fmin_l_bfgs_b' Can either be one of the internally supported optimizers for optimizing the kernel's parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature::
def optimizer(obj_func, initial_theta, bounds): # * 'obj_func' is the objective function to be minimized, which # takes the hyperparameters theta as parameter and an # optional flag eval_gradient, which determines if the # gradient is returned additionally to the function value # * 'initial_theta': the initial value for theta, which can be # used by local optimizers # * 'bounds': the bounds on the values of theta .... # Returned are the best found hyperparameters theta and # the corresponding value of the target function. return theta_opt, func_min
Per default, the 'L-BGFS-B' algorithm from scipy.optimize.minimize is used. If None is passed, the kernel's parameters are kept fixed. Available internal optimizers are::
'fmin_l_bfgs_b'
n_restarts_optimizer : int, default=0 The number of restarts of the optimizer for finding the kernel's parameters which maximize the log-marginal likelihood. The first run of the optimizer is performed from the kernel's initial parameters, the remaining ones (if any) from thetas sampled log-uniform randomly from the space of allowed theta-values. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer == 0 implies that one run is performed.
normalize_y : boolean, optional (default: False) Whether the target values y are normalized, the mean and variance of the target values are set equal to 0 and 1 respectively. This is recommended for cases where zero-mean, unit-variance priors are used. Note that, in this implementation, the normalisation is reversed before the GP predictions are reported.
.. versionchanged:: 0.23
copy_X_train : bool, default=True If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally.
random_state : int or RandomState, default=None Determines random number generation used to initialize the centers. Pass an int for reproducible results across multiple function calls. See :term: `Glossary <random_state>`.
Attributes ---------- X_train_ : array-like of shape (n_samples, n_features) or list of object Feature vectors or other representations of training data (also required for prediction).
y_train_ : array-like of shape (n_samples,) or (n_samples, n_targets) Target values in training data (also required for prediction)
kernel_ : kernel instance The kernel used for prediction. The structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters
L_ : array-like of shape (n_samples, n_samples) Lower-triangular Cholesky decomposition of the kernel in ``X_train_``
alpha_ : array-like of shape (n_samples,) Dual coefficients of training data points in kernel space
log_marginal_likelihood_value_ : float The log-marginal-likelihood of ``self.kernel_.theta``
Examples -------- >>> from sklearn.datasets import make_friedman2 >>> from sklearn.gaussian_process import GaussianProcessRegressor >>> from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel >>> X, y = make_friedman2(n_samples=500, noise=0, random_state=0) >>> kernel = DotProduct() + WhiteKernel() >>> gpr = GaussianProcessRegressor(kernel=kernel, ... random_state=0).fit(X, y) >>> gpr.score(X, y) 0.3680... >>> gpr.predict(X:2,:
, return_std=True) (array(653.0..., 592.1...
), array(316.6..., 316.6...
))