package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
val get_py : string -> Py.Object.t

Get an attribute of this module as a Py.Object.t. This is useful to pass a Python function to another function.

val cartesian : ?out:Py.Object.t -> arrays:Py.Object.t -> unit -> Arr.t

Generate a cartesian product of input arrays.

Parameters ---------- arrays : list of array-like 1-D arrays to form the cartesian product of. out : ndarray Array to place the cartesian product in.

Returns ------- out : ndarray 2-D array of shape (M, len(arrays)) containing cartesian products formed of input arrays.

Examples -------- >>> cartesian((1, 2, 3, 4, 5, 6, 7)) array([1, 4, 6], [1, 4, 7], [1, 5, 6], [1, 5, 7], [2, 4, 6], [2, 4, 7], [2, 5, 6], [2, 5, 7], [3, 4, 6], [3, 4, 7], [3, 5, 6], [3, 5, 7])

val check_array : ?accept_sparse:[ `S of string | `Bool of bool | `StringList of string list ] -> ?accept_large_sparse:bool -> ?dtype: [ `S of string | `Dtype of Py.Object.t | `TypeList of Py.Object.t | `None ] -> ?order:[ `F | `C ] -> ?copy:bool -> ?force_all_finite:[ `Bool of bool | `Allow_nan ] -> ?ensure_2d:bool -> ?allow_nd:bool -> ?ensure_min_samples:int -> ?ensure_min_features:int -> ?warn_on_dtype:bool -> ?estimator:[ `S of string | `Estimator of Py.Object.t ] -> array:Py.Object.t -> unit -> Py.Object.t

Input validation on an array, list, sparse matrix or similar.

By default, the input is checked to be a non-empty 2D array containing only finite values. If the dtype of the array is object, attempt converting to float, raising on failure.

Parameters ---------- array : object Input object to check / convert.

accept_sparse : string, boolean or list/tuple of strings (default=False) Strings representing allowed sparse matrix formats, such as 'csc', 'csr', etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.

accept_large_sparse : bool (default=True) If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse=False will cause it to be accepted only if its indices are stored with a 32-bit dtype.

.. versionadded:: 0.20

dtype : string, type, list of types or None (default="numeric") Data type of result. If None, the dtype of the input is preserved. If "numeric", dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.

order : 'F', 'C' or None (default=None) Whether an array will be forced to be fortran or c-style. When order is None (default), then if copy=False, nothing is ensured about the memory layout of the output array; otherwise (copy=True) the memory layout of the returned array is kept as close as possible to the original array.

copy : boolean (default=False) Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

force_all_finite : boolean or 'allow-nan', (default=True) Whether to raise an error on np.inf and np.nan in array. The possibilities are:

  • True: Force all values of array to be finite.
  • False: accept both np.inf and np.nan in array.
  • 'allow-nan': accept only np.nan values in array. Values cannot be infinite.

For object dtyped data, only np.nan is checked and not np.inf.

.. versionadded:: 0.20 ``force_all_finite`` accepts the string ``'allow-nan'``.

ensure_2d : boolean (default=True) Whether to raise a value error if array is not 2D.

allow_nd : boolean (default=False) Whether to allow array.ndim > 2.

ensure_min_samples : int (default=1) Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check.

ensure_min_features : int (default=1) Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and ``ensure_2d`` is True. Setting to 0 disables this check.

warn_on_dtype : boolean or None, optional (default=None) Raise DataConversionWarning if the dtype of the input data structure does not match the requested dtype, causing a memory copy.

.. deprecated:: 0.21 ``warn_on_dtype`` is deprecated in version 0.21 and will be removed in 0.23.

estimator : str or estimator instance (default=None) If passed, include the name of the estimator in warning messages.

Returns ------- array_converted : object The converted and validated array.

val check_random_state : seed:[ `I of int | `RandomState of Py.Object.t | `None ] -> unit -> Py.Object.t

Turn seed into a np.random.RandomState instance

Parameters ---------- seed : None | int | instance of RandomState If seed is None, return the RandomState singleton used by np.random. If seed is an int, return a new RandomState instance seeded with seed. If seed is already a RandomState instance, return it. Otherwise raise ValueError.

val density : ?kwargs:(string * Py.Object.t) list -> w:Arr.t -> unit -> Py.Object.t

Compute density of a sparse vector

Parameters ---------- w : array_like The sparse vector

Returns ------- float The density of w, between 0 and 1

module Deprecated : sig ... end
val fast_logdet : a:Arr.t -> unit -> Py.Object.t

Compute log(det(A)) for A symmetric

Equivalent to : np.log(nl.det(A)) but more robust. It returns -Inf if det(A) is non positive or is not defined.

Parameters ---------- A : array_like The matrix

val log_logistic : ?out:[ `Arr of Arr.t | `T_ of Py.Object.t ] -> x:Arr.t -> unit -> Arr.t

Compute the log of the logistic function, ``log(1 / (1 + e ** -x))``.

This implementation is numerically stable because it splits positive and negative values::

-log(1 + exp(-x_i)) if x_i > 0 x_i - log(1 + exp(x_i)) if x_i <= 0

For the ordinary logistic function, use ``scipy.special.expit``.

Parameters ---------- X : array-like, shape (M, N) or (M, ) Argument to the logistic function

out : array-like, shape: (M, N) or (M, ), optional: Preallocated output array.

Returns ------- out : array, shape (M, N) or (M, ) Log of the logistic function evaluated at every point in x

Notes ----- See the blog post describing this implementation: http://fa.bianp.net/blog/2013/numerical-optimizers-for-logistic-regression/

val make_nonnegative : ?min_value:Py.Object.t -> x:Arr.t -> unit -> Py.Object.t

Ensure `X.min()` >= `min_value`.

Parameters ---------- X : array_like The matrix to make non-negative min_value : float The threshold value

Returns ------- array_like The thresholded array

Raises ------ ValueError When X is sparse

val randomized_range_finder : ?power_iteration_normalizer:[ `Auto | `QR | `LU | `None ] -> ?random_state:int -> a:Py.Object.t -> size:int -> n_iter:int -> unit -> Py.Object.t

Computes an orthonormal matrix whose range approximates the range of A.

Parameters ---------- A : 2D array The input data matrix

size : integer Size of the return array

n_iter : integer Number of power iterations used to stabilize the result

power_iteration_normalizer : 'auto' (default), 'QR', 'LU', 'none' Whether the power iterations are normalized with step-by-step QR factorization (the slowest but most accurate), 'none' (the fastest but numerically unstable when `n_iter` is large, e.g. typically 5 or larger), or 'LU' factorization (numerically stable but can lose slightly in accuracy). The 'auto' mode applies no normalization if `n_iter` <= 2 and switches to LU otherwise.

.. versionadded:: 0.18

random_state : int, RandomState instance or None, optional (default=None) The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`.

Returns ------- Q : 2D array A (size x size) projection matrix, the range of which approximates well the range of the input matrix A.

Notes -----

Follows Algorithm 4.3 of Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 (arXiv:909) https://arxiv.org/pdf/0909.4061.pdf

An implementation of a randomized algorithm for principal component analysis A. Szlam et al. 2014

val randomized_svd : ?n_oversamples:Py.Object.t -> ?n_iter:[ `I of int | `T_auto_ of Py.Object.t ] -> ?power_iteration_normalizer:[ `Auto | `QR | `LU | `None ] -> ?transpose:[ `Bool of bool | `Auto ] -> ?flip_sign:[ `Bool of bool | `T_True_by of Py.Object.t ] -> ?random_state:int -> m:Arr.t -> n_components:int -> unit -> Py.Object.t

Computes a truncated randomized SVD

Parameters ---------- M : ndarray or sparse matrix Matrix to decompose

n_components : int Number of singular values and vectors to extract.

n_oversamples : int (default is 10) Additional number of random vectors to sample the range of M so as to ensure proper conditioning. The total number of random vectors used to find the range of M is n_components + n_oversamples. Smaller number can improve speed but can negatively impact the quality of approximation of singular vectors and singular values.

n_iter : int or 'auto' (default is 'auto') Number of power iterations. It can be used to deal with very noisy problems. When 'auto', it is set to 4, unless `n_components` is small (< .1 * min(X.shape)) `n_iter` in which case is set to 7. This improves precision with few components.

.. versionchanged:: 0.18

power_iteration_normalizer : 'auto' (default), 'QR', 'LU', 'none' Whether the power iterations are normalized with step-by-step QR factorization (the slowest but most accurate), 'none' (the fastest but numerically unstable when `n_iter` is large, e.g. typically 5 or larger), or 'LU' factorization (numerically stable but can lose slightly in accuracy). The 'auto' mode applies no normalization if `n_iter` <= 2 and switches to LU otherwise.

.. versionadded:: 0.18

transpose : True, False or 'auto' (default) Whether the algorithm should be applied to M.T instead of M. The result should approximately be the same. The 'auto' mode will trigger the transposition if M.shape1 > M.shape0 since this implementation of randomized SVD tend to be a little faster in that case.

.. versionchanged:: 0.18

flip_sign : boolean, (True by default) The output of a singular value decomposition is only unique up to a permutation of the signs of the singular vectors. If `flip_sign` is set to `True`, the sign ambiguity is resolved by making the largest loadings for each component in the left singular vectors positive.

random_state : int, RandomState instance or None, optional (default=None) The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`.

Notes ----- This algorithm finds a (usually very good) approximate truncated singular value decomposition using randomization to speed up the computations. It is particularly fast on large matrices on which you wish to extract only a small number of components. In order to obtain further speed up, `n_iter` can be set <=2 (at the cost of loss of precision).

References ---------- * Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 https://arxiv.org/abs/0909.4061

* A randomized algorithm for the decomposition of matrices Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert

* An implementation of a randomized algorithm for principal component analysis A. Szlam et al. 2014

val row_norms : ?squared:Py.Object.t -> x:Arr.t -> unit -> Py.Object.t

Row-wise (squared) Euclidean norm of X.

Equivalent to np.sqrt((X * X).sum(axis=1)), but also supports sparse matrices and does not create an X.shape-sized temporary.

Performs no input validation.

Parameters ---------- X : array_like The input array squared : bool, optional (default = False) If True, return squared norms.

Returns ------- array_like The row-wise (squared) Euclidean norm of X.

val safe_min : x:Arr.t -> unit -> Py.Object.t

DEPRECATED: safe_min is deprecated in version 0.22 and will be removed in version 0.24.

Returns the minimum value of a dense or a CSR/CSC matrix.

Adapated from https://stackoverflow.com/q/13426580

.. deprecated:: 0.22.0

Parameters ---------- X : array_like The input array or sparse matrix

Returns ------- Float The min value of X

val safe_sparse_dot : ?dense_output:Py.Object.t -> a:Arr.t -> b:Py.Object.t -> unit -> Arr.t

Dot product that handle the sparse matrix case correctly

Parameters ---------- a : array or sparse matrix b : array or sparse matrix dense_output : boolean, (default=False) When False, ``a`` and ``b`` both being sparse will yield sparse output. When True, output will always be a dense array.

Returns ------- dot_product : array or sparse matrix sparse if ``a`` and ``b`` are sparse and ``dense_output=False``.

val softmax : ?copy:bool -> x:Py.Object.t -> unit -> Arr.t

Calculate the softmax function.

The softmax function is calculated by np.exp(X) / np.sum(np.exp(X), axis=1)

This will cause overflow when large values are exponentiated. Hence the largest value in each row is subtracted from each data point to prevent this.

Parameters ---------- X : array-like of floats, shape (M, N) Argument to the logistic function

copy : bool, optional Copy X or not.

Returns ------- out : array, shape (M, N) Softmax function evaluated at every point in x

val squared_norm : x:Arr.t -> unit -> Py.Object.t

Squared Euclidean or Frobenius norm of x.

Faster than norm(x) ** 2.

Parameters ---------- x : array_like

Returns ------- float The Euclidean norm when x is a vector, the Frobenius norm when x is a matrix (2-d array).

val stable_cumsum : ?axis:Py.Object.t -> ?rtol:Py.Object.t -> ?atol:Py.Object.t -> arr:Arr.t -> unit -> Py.Object.t

Use high precision for cumsum and check that final value matches sum

Parameters ---------- arr : array-like To be cumulatively summed as flat axis : int, optional Axis along which the cumulative sum is computed. The default (None) is to compute the cumsum over the flattened array. rtol : float Relative tolerance, see ``np.allclose`` atol : float Absolute tolerance, see ``np.allclose``

val svd_flip : ?u_based_decision:bool -> u:Arr.t -> v:Arr.t -> unit -> Py.Object.t

Sign correction to ensure deterministic output from SVD.

Adjusts the columns of u and the rows of v such that the loadings in the columns in u that are largest in absolute value are always positive.

Parameters ---------- u : ndarray u and v are the output of `linalg.svd` or :func:`~sklearn.utils.extmath.randomized_svd`, with matching inner dimensions so one can compute `np.dot(u * s, v)`.

v : ndarray u and v are the output of `linalg.svd` or :func:`~sklearn.utils.extmath.randomized_svd`, with matching inner dimensions so one can compute `np.dot(u * s, v)`.

u_based_decision : boolean, (default=True) If True, use the columns of u as the basis for sign flipping. Otherwise, use the rows of v. The choice of which variable to base the decision on is generally algorithm dependent.

Returns ------- u_adjusted, v_adjusted : arrays with the same dimensions as the input.

val weighted_mode : ?axis:Py.Object.t -> a:Arr.t -> w:Py.Object.t -> unit -> Arr.t

Returns an array of the weighted modal (most common) value in a

If there is more than one such value, only the first is returned. The bin-count for the modal bins is also returned.

This is an extension of the algorithm in scipy.stats.mode.

Parameters ---------- a : array_like n-dimensional array of which to find mode(s). w : array_like n-dimensional array of weights for each value axis : int, optional Axis along which to operate. Default is 0, i.e. the first axis.

Returns ------- vals : ndarray Array of modal values. score : ndarray Array of weighted counts for each mode.

Examples -------- >>> from sklearn.utils.extmath import weighted_mode >>> x = 4, 1, 4, 2, 4, 2 >>> weights = 1, 1, 1, 1, 1, 1 >>> weighted_mode(x, weights) (array(4.), array(3.))

The value 4 appears three times: with uniform weights, the result is simply the mode of the distribution.

>>> weights = 1, 3, 0.5, 1.5, 1, 2 # deweight the 4's >>> weighted_mode(x, weights) (array(2.), array(3.5))

The value 2 has the highest score: it appears twice with weights of 1.5 and 2: the sum of these is 3.5.

See Also -------- scipy.stats.mode

OCaml

Innovation. Community. Security.