DatasetSplitter#
- class tpcp.validate.DatasetSplitter(base_splitter: int | BaseCrossValidator | Iterator | None = None, *, groupby: list[str] | str | None = None, stratify: list[str] | str | None = None)[source]#
Wrapper around sklearn cross-validation splitters to support grouping and stratification with tpcp-Datasets.
This wrapper can be used instead of a sklearn-style splitter with all methods that support a
cvparameter. Whenever you want to do complicated cv-logic (like grouping or stratification’s), this wrapper is the way to go.Warning
We don’t validate if the selected
base_splitterdoes anything useful with the providedgroupbyandstratifyinformation. This wrapper just ensures, that the information is correctly extracted from the dataset and passed to thesplitmethod of thebase_splitter. So if you are using a normalKFoldsplitter, thegroupbyandstratifyarguments will have no effect.- Parameters:
- base_splitter
The base splitter to use. Can be an integer (for
KFold), an iterator, or any other valid sklearn-splitter. The default is None, which will use the sklearn defaultKFoldsplitter with 5 splits.- groupby
The column(s) to group by. If None, no grouping is done. Must be a subset of the columns in the dataset.
This will generate a set of unique string labels with the same shape as the dataset. This will passed to the base splitter as the
groupsparameter. It is up to the base splitter to decide what to do with the generated labels.- stratify
The column(s) to stratify by. If None, no stratification is done. Must be a subset of the columns in the dataset.
This will generate a set of unique string labels with the same shape as the dataset. This will passed to the base splitter as the
yparameter, acting as “mock” target labels, as sklearn only support stratification on classification outcome targets. It is up to the base splitter to decide what to do with the generated labels.
Methods
clone()Create a new instance of the class with all parameters copied over.
get_n_splits(dataset)Get the number of splits.
get_params([deep])Get parameters for this algorithm.
set_params(**params)Set the parameters of this Algorithm.
split(dataset)Split the dataset into train and test sets.
- __init__(base_splitter: int | BaseCrossValidator | Iterator | None = None, *, groupby: list[str] | str | None = None, stratify: list[str] | str | None = None)[source]#
- clone() Self[source]#
Create a new instance of the class with all parameters copied over.
This will create a new instance of the class itself and all nested objects
- get_params(deep: bool = True) dict[str, Any][source]#
Get parameters for this algorithm.
- Parameters:
- deep
Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like
nested_object_name__(Note the two “_” at the end)
- Returns:
- params
Parameter names mapped to their values.