A Bagging classifier.
A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.
This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting 1
_. If samples are drawn with replacement, then the method is known as Bagging 2
_. When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces 3
_. Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches 4
_.
Read more in the :ref:`User Guide <bagging>`.
.. versionadded:: 0.15
Parameters ---------- base_estimator : object, default=None The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a decision tree.
n_estimators : int, default=10 The number of base estimators in the ensemble.
max_samples : int or float, default=1.0 The number of samples to draw from X to train each base estimator (with replacement by default, see `bootstrap` for more details).
- If int, then draw `max_samples` samples.
- If float, then draw `max_samples * X.shape
0
` samples.
max_features : int or float, default=1.0 The number of features to draw from X to train each base estimator ( without replacement by default, see `bootstrap_features` for more details).
- If int, then draw `max_features` features.
- If float, then draw `max_features * X.shape
1
` features.
bootstrap : bool, default=True Whether samples are drawn with replacement. If False, sampling without replacement is performed.
bootstrap_features : bool, default=False Whether features are drawn with replacement.
oob_score : bool, default=False Whether to use out-of-bag samples to estimate the generalization error.
warm_start : bool, default=False When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble. See :term:`the Glossary <warm_start>`.
.. versionadded:: 0.17 *warm_start* constructor parameter.
n_jobs : int, default=None The number of jobs to run in parallel for both :meth:`fit` and :meth:`predict`. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.
random_state : int or RandomState, default=None Controls the random resampling of the original dataset (sample wise and feature wise). If the base estimator accepts a `random_state` attribute, a different seed is generated for each instance in the ensemble. Pass an int for reproducible output across multiple function calls. See :term:`Glossary <random_state>`.
verbose : int, default=0 Controls the verbosity when fitting and predicting.
Attributes ---------- base_estimator_ : estimator The base estimator from which the ensemble is grown.
n_features_ : int The number of features when :meth:`fit` is performed.
estimators_ : list of estimators The collection of fitted base estimators.
estimators_samples_ : list of arrays The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by an array of the indices selected.
estimators_features_ : list of arrays The subset of drawn features for each base estimator.
classes_ : ndarray of shape (n_classes,) The classes labels.
n_classes_ : int or list The number of classes.
oob_score_ : float Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only when ``oob_score`` is True.
oob_decision_function_ : ndarray of shape (n_samples, n_classes) Decision function computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, `oob_decision_function_` might contain NaN. This attribute exists only when ``oob_score`` is True.
Examples -------- >>> from sklearn.svm import SVC >>> from sklearn.ensemble import BaggingClassifier >>> from sklearn.datasets import make_classification >>> X, y = make_classification(n_samples=100, n_features=4, ... n_informative=2, n_redundant=0, ... random_state=0, shuffle=False) >>> clf = BaggingClassifier(base_estimator=SVC(), ... n_estimators=10, random_state=0).fit(X, y) >>> clf.predict([0, 0, 0, 0]
) array(1
)
References ----------
.. 1
L. Breiman, 'Pasting small votes for classification in large databases and on-line', Machine Learning, 36(1), 85-103, 1999.
.. 2
L. Breiman, 'Bagging predictors', Machine Learning, 24(2), 123-140, 1996.
.. 3
T. Ho, 'The random subspace method for constructing decision forests', Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.
.. 4
G. Louppe and P. Geurts, 'Ensembles on Random Patches', Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.