tpcp.OptimizablePipeline#

class tpcp.OptimizablePipeline(*args, **kwds)[source]#

Pipeline with custom ways to optimize and/or train input parameters.

OptimizablePipelines are expected to implement a concrete way to train internal models or optimize parameters. This should not be a reimplementation of GridSearch or similar methods. For this tpcp.pipelines.GridSearch should be used directly.

It is important that self_optimize only modifies input parameters of the pipeline that are marked as OptimizableParameter. This means, if a parameter is optimized, by self_optimize it should be named in the __init__, should be exportable when calling pipeline.get_params and should be annotated using the OptimizableParameter type hint on class level. For the sake of documentation (and potential automatic checks) in the future, it also makes sense to add the HyperParameter type annotation to all parameters that act as hyper parameters for the optimization performed in self_optimize. To learn more about parameter annotations check this example and this ` guide <optimization>`_ in the docs.

It is also possible to optimize nested parameters. For example, if the input of the pipeline is an algorithm or another pipeline on its own, all parameters of these objects can be modified as well. In any case, you should make sure that all optimized parameters are still there if you call .clone() on the optimized pipeline.

Methods

clone()

Create a new instance of the class with all parameters copied over.

get_params([deep])

Get parameters for this algorithm.

run(datapoint)

Run the pipeline.

safe_run(datapoint)

Run the pipeline with some additional checks.

score(datapoint)

Calculate performance of the pipeline on a datapoint with reference information.

self_optimize(dataset, **kwargs)

Optimize the input parameters of the pipeline or algorithm using any logic.

self_optimize_with_info(dataset, **kwargs)

Optimize the input parameters of the pipeline or algorithm using any logic.

set_params(**params)

Set the parameters of this Algorithm.

__init__(*args, **kwargs)#
clone() Self[source]#

Create a new instance of the class with all parameters copied over.

This will create a new instance of the class itself and all nested objects

get_params(deep: bool = True) Dict[str, Any][source]#

Get parameters for this algorithm.

Parameters:
deep

Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like nested_object_name__ (Note the two “_” at the end)

Returns:
params

Parameter names mapped to their values.

run(datapoint: DatasetT) Self[source]#

Run the pipeline.

Note

It is usually preferred to use safe_run on custom pipelines instead of run, as safe_run can catch certain implementation errors of the run method.

Parameters:
datapoint

An instance of a tpcp.Dataset containing only a single datapoint. The structure of the data will depend on the dataset.

Returns:
self

The class instance with all result attributes populated

safe_run(datapoint: DatasetT) Self[source]#

Run the pipeline with some additional checks.

It is preferred to use this method over run, as it can catch some simple implementation errors of custom pipelines.

The following things are checked:

  • The run method must return self (or at least an instance of the pipeline)

  • The run method must set result attributes on the pipeline

  • All result attributes must have a trailing _ in their name

  • The run method must not modify the input parameters of the pipeline

Parameters:
datapoint

An instance of a tpcp.Dataset containing only a single datapoint. The structure of the data will depend on the dataset.

Returns:
self

The class instance with all result attributes populated

score(datapoint: DatasetT) Union[float, Dict[str, float]][source]#

Calculate performance of the pipeline on a datapoint with reference information.

This is an optional method and does not need to be implemented in many cases. Usually stand-a-lone functions are better suited as scorer.

A typical score method will call self.run(datapoint) and then compare the results with reference values also available on the dataset.

Parameters:
datapoint

An instance of a tpcp.Dataset containing only a single datapoint. The structure of the data and the available reference information will depend on the dataset.

Returns:
score

A float or dict of float quantifying the quality of the pipeline on the provided data. A higher score is always better.

self_optimize(dataset: DatasetT, **kwargs) Self[source]#

Optimize the input parameters of the pipeline or algorithm using any logic.

This method can be used to adapt the input parameters (values provided in the init) based on any data driven heuristic.

Note

The optimizations must only modify the input parameters (aka self.clone should retain the optimization results). If you need to return further information, implement self_optimize_with_info instead.

Parameters:
dataset

An instance of a tpcp.Dataset containing one or multiple data points that can be used for training. The structure of the data and the available reference information will depend on the dataset.

kwargs

Additional parameters required for the optimization process.

Returns:
self

The class instance with optimized input parameters.

self_optimize_with_info(dataset: DatasetT, **kwargs) Tuple[Self, Any][source]#

Optimize the input parameters of the pipeline or algorithm using any logic.

This is equivalent to self_optimize, but allows you to return additional information as a second return value. If you implement this method, there is no need to implement self_optimize as well.

Parameters:
dataset

An instance of a tpcp.Dataset containing one or multiple data points that can be used for training. The structure of the data and the available reference information will depend on the dataset.

kwargs

Additional parameters required for the optimization process.

Returns:
self

The class instance with optimized input parameters.

info

An arbitrary piece of information

set_params(**params: Any) Self[source]#

Set the parameters of this Algorithm.

To set parameters of nested objects use nested_object_name__para_name=.

Examples using tpcp.OptimizablePipeline#

Optimizable Pipelines

Optimizable Pipelines

Optimizable Pipelines
GridSearchCV

GridSearchCV

GridSearchCV
Cross Validation

Cross Validation

Cross Validation
Optimization Info

Optimization Info

Optimization Info