
class tpcp.optimize.GridSearch(pipeline: PipelineT, parameter_grid: ParameterGrid, *, scoring: Callable[[PipelineT, DatasetT], float | Aggregator[Any] | dict[str, float | Aggregator[Any]]] | Scorer[PipelineT, DatasetT], n_jobs: int | None = None, return_optimized: bool | str = True, pre_dispatch: int | str = 'n_jobs', progress_bar: bool = True)[source]#

Perform a grid search over various parameters.

This scores the pipeline for every combination of data points in the provided dataset and parameter combinations in the parameter_grid. The scores over the entire dataset are then aggregated for each parameter combination. By default, this aggregation is a simple average.


This is different to how grid search works in many other cases: Usually, the performance parameter would be calculated on all data points at once. Here, each data point represents an entire participant or recording (depending on the dataset). Therefore, the pipeline and the scoring method are expected to provide a result/score per data point in the dataset. Note that it is still open to your interpretation what you consider a “data point” in the context of your analysis. The run method of the pipeline can still process multiple data points, e.g., gait tests, in a loop and generate a single output if you consider a single participant one data point.


The pipeline object to optimize


A sklearn parameter grid to define the search space.


A callable that can score a single data point given a pipeline. This function should return either a single score or a dictionary of scores.

Note that if scoring returns a dictionary, return_optimized must be set to the name of the score that should be used for ranking.


The number of processes that should be used to parallelize the search. None means 1 while -1 means as many as logical processing cores.


The number of jobs that should be pre dispatched. For an explanation see the documentation of GridSearchCV


If True, a pipeline object with the overall best params is created and stored as optimized_pipeline_. If scoring returns a dictionary of score values, this must be a str corresponding to the name of the score that should be used to rank the results. If False, the respective result attributes will not be populated. If multiple parameter combinations have the same score, the one tested first will be used. By default, the value with the best rank (i.e. higher score) is used. If you want to select the value with the lowest score, set return_optimized to the name of the score prefixed with a minus sign, e.g. -rmse. In case of a single score, use -score to select the value with the lowest score.


True/False to enable/disable a tqdm progress bar.

Other Parameters:

The dataset instance passed to the optimize method


A dictionary summarizing all results of the gridsearch. The format of this dictionary is designed to be directly passed into the DataFrame constructor. Each column then represents the result for one set of parameters

The dictionary contains the following entries:


The value of a respective parameter


A dictionary representing all parameters

score / {scorer-name}

The aggregated value of a score over all data-points. If a single score is used for scoring, then the generic name “score” is used. Otherwise, multiple columns with the name of the respective scorer exist

rank__score / rank__{scorer-name}

A sorting for each score from the highest to the lowest value. If lower or higher values are better, depends on the scoring function and needs to be interpreted accordingly.

single__score / single__{scorer-name}

The individual scores per data point for each parameter combination. This is a list of values with the len(dataset).


A list of data labels in the order the single score values are provided. These can be used to associate the single_score values with a certain data point.


An instance of the input pipeline with the best parameter set. This is only available if return_optimized is not False.


The parameter dict that resulted in the best result. This is only available if return_optimized is not False.


The index of the result row in the output. This is only available if return_optimized is not False.


The score of the best result. In a multimetric case, only the value of the scorer specified by return_optimized is provided. This is only available if return_optimized is not False.


If the scorer returned multiple scores



Create a new instance of the class with all parameters copied over.


Get parameters for this algorithm.

optimize(dataset, **_)

Run the grid search over the dataset and find the best parameter combination.


Run the optimized pipeline.


Run the optimized pipeline.


Set the parameters of this Algorithm.

__init__(pipeline: PipelineT, parameter_grid: ParameterGrid, *, scoring: Callable[[PipelineT, DatasetT], float | Aggregator[Any] | dict[str, float | Aggregator[Any]]] | Scorer[PipelineT, DatasetT], n_jobs: int | None = None, return_optimized: bool | str = True, pre_dispatch: int | str = 'n_jobs', progress_bar: bool = True) None[source]#
_format_results(candidate_params, out)[source]#

Format the final result dict.

This function is adapted based on sklearn’s BaseSearchCV

clone() Self[source]#

Create a new instance of the class with all parameters copied over.

This will create a new instance of the class itself and all nested objects

get_params(deep: bool = True) dict[str, Any][source]#

Get parameters for this algorithm.


Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like nested_object_name__ (Note the two “_” at the end)


Parameter names mapped to their values.

optimize(dataset: DatasetT, **_: Any) Self[source]#

Run the grid search over the dataset and find the best parameter combination.


The dataset used for optimization.

run(datapoint: DatasetT) PipelineT[source]#

Run the optimized pipeline.

This is a wrapper to contain API compatibility with Pipeline.

safe_run(datapoint: DatasetT) PipelineT[source]#

Run the optimized pipeline.

This is a wrapper to contain API compatibility with Pipeline.

set_params(**params: Any) Self[source]#

Set the parameters of this Algorithm.

To set parameters of nested objects use nested_object_name__para_name=.

Examples using tpcp.optimize.GridSearch#

Grid Search optimal Algorithm Parameter

Grid Search optimal Algorithm Parameter

Optimizable Pipelines

Optimizable Pipelines



Custom Optuna Optimizer

Custom Optuna Optimizer

Build-in Optuna Optimizers

Build-in Optuna Optimizers

Dataclass and Attrs support

Dataclass and Attrs support