package sklearn

You can search for identifiers within the package.

in-package search v0.2.0

sklearn
- Sklearn

Legend:
Library
Module
Module type
Parameter
Class
Class type

type tag = [

| `DictVectorizer

]

type t =
  [ `BaseEstimator | `DictVectorizer | `Object | `TransformerMixin ] Obj.t

val of_pyobject : Py.Object.t -> t

val to_pyobject : [> tag ] Obj.t -> Py.Object.t

val as_transformer : t -> [ `TransformerMixin ] Obj.t

val as_estimator : t -> [ `BaseEstimator ] Obj.t

val create : 
  ?dtype:Np.Dtype.t ->
  ?separator:string ->
  ?sparse:bool ->
  ?sort:bool ->
  unit ->
  t

Transforms lists of feature-value mappings to vectors.

This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikit-learn estimators.

When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature 'f' that can take on the values 'ham' and 'spam' will become two features in the output, one signifying 'f=ham', the other 'f=spam'.

However, note that this transformer will only do a binary one-hot encoding when feature values are of type string. If categorical features are represented as numeric values such as int, the DictVectorizer can be followed by :class:`sklearn.preprocessing.OneHotEncoder` to complete binary one-hot encoding.

Features that do not occur in a sample (mapping) will have a zero value in the resulting array/matrix.

Read more in the :ref:`User Guide <dict_feature_extraction>`.

Parameters ---------- dtype : dtype, default=np.float64 The type of feature values. Passed to Numpy array/scipy.sparse matrix constructors as the dtype argument. separator : str, default='=' Separator string used when constructing new features for one-hot coding. sparse : bool, default=True Whether transform should produce scipy.sparse matrices. sort : bool, default=True Whether ``feature_names_`` and ``vocabulary_`` should be sorted when fitting.

Attributes ---------- vocabulary_ : dict A dictionary mapping feature names to feature indices.

feature_names_ : list A list of length n_features containing the feature names (e.g., 'f=ham' and 'f=spam').

Examples -------- >>> from sklearn.feature_extraction import DictVectorizer >>> v = DictVectorizer(sparse=False) >>> D = {'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1} >>> X = v.fit_transform(D) >>> X array([2., 0., 1.], [0., 1., 3.]) >>> v.inverse_transform(X) == {'bar': 2.0, 'foo': 1.0}, {'baz': 1.0, 'foo': 3.0} True >>> v.transform('foo': 4, 'unseen_feature': 3) array([0., 0., 4.])

See also -------- FeatureHasher : performs vectorization using only a hash function. sklearn.preprocessing.OrdinalEncoder : handles nominal/categorical features encoded as columns of arbitrary data types.

val fit : ?y:Py.Object.t -> x:Py.Object.t -> [> tag ] Obj.t -> t

Learn a list of feature name -> indices mappings.

Parameters ---------- X : Mapping or iterable over Mappings Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype). y : (ignored)

Returns ------- self

val fit_transform : 
  ?y:Py.Object.t ->
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Learn a list of feature name -> indices mappings and transform X.

Like fit(X) followed by transform(X), but does not require materializing X in memory.

Parameters ---------- X : Mapping or iterable over Mappings Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype). y : (ignored)

Returns ------- Xa : array, sparse matrix Feature vectors; always 2-d.

val get_feature_names : [> tag ] Obj.t -> Py.Object.t

Returns a list of feature names, ordered by their indices.

If one-of-K coding is applied to categorical features, this will include the constructed feature names but not the original ones.

val get_params : ?deep:bool -> [> tag ] Obj.t -> Dict.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

val inverse_transform : 
  ?dict_type:Np.Dtype.t ->
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  Py.Object.t

Transform array or sparse matrix X back to feature mappings.

X must have been produced by this DictVectorizer's transform or fit_transform method; it may only have passed through transformers that preserve the number of features and their order.

In the case of one-hot/one-of-K coding, the constructed feature names and values are returned rather than the original ones.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Sample matrix. dict_type : type, default=dict Constructor for feature mappings. Must conform to the collections.Mapping API.

Returns ------- D : list of dict_type objects of shape (n_samples,) Feature mappings for the samples in X.

val restrict : 
  ?indices:bool ->
  support:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  Py.Object.t

Restrict the features to those in support using feature selection.

This function modifies the estimator in-place.

Parameters ---------- support : array-like Boolean mask or list of indices (as returned by the get_support member of feature selectors). indices : bool, default=False Whether support is a list of indices.

Returns ------- self

Examples -------- >>> from sklearn.feature_extraction import DictVectorizer >>> from sklearn.feature_selection import SelectKBest, chi2 >>> v = DictVectorizer() >>> D = {'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1} >>> X = v.fit_transform(D) >>> support = SelectKBest(chi2, k=2).fit(X, 0, 1) >>> v.get_feature_names() 'bar', 'baz', 'foo' >>> v.restrict(support.get_support()) DictVectorizer() >>> v.get_feature_names() 'bar', 'foo'

val set_params : ?params:(string * Py.Object.t) list -> [> tag ] Obj.t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

val transform : 
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Transform feature->value dicts to array or sparse matrix.

Named features not encountered during fit or fit_transform will be silently ignored.

Parameters ---------- X : Mapping or iterable over Mappings of shape (n_samples,) Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype).

Returns ------- Xa : array, sparse matrix Feature vectors; always 2-d.

val vocabulary_ : t -> Dict.t

Attribute vocabulary_: get value or raise Not_found if None.

val vocabulary_opt : t -> Dict.t option

Attribute vocabulary_: get value as an option.

val feature_names_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute feature_names_: get value or raise Not_found if None.

val feature_names_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute feature_names_: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Format.formatter -> t -> unit

Pretty-print the object to a formatter.