Pipeline#

class tpcp.Pipeline[source]#

Baseclass for all custom pipelines.

To create your own custom pipeline, subclass this class and implement run.

Methods

`clone`()	Create a new instance of the class with all parameters copied over.
`get_params`([deep])	Get parameters for this algorithm.
`run`(datapoint)	Run the pipeline.
`safe_run`(datapoint)	Run the pipeline with some additional checks.
`score`(datapoint)	Calculate performance of the pipeline on a datapoint with reference information.
`set_params`(**params)	Set the parameters of this Algorithm.

__init__(*args, **kwargs)#

clone() → Self[source]#

Create a new instance of the class with all parameters copied over.

This will create a new instance of the class itself and all nested objects

get_params(deep: bool = True) → dict[str, Any][source]#

Get parameters for this algorithm.

Parameters:

deep: Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like nested_object_name__ (Note the two “_” at the end)

Returns:

params: Parameter names mapped to their values.

run(datapoint: DatasetT) → Self[source]#

Run the pipeline.

Note

It is usually preferred to use safe_run on custom pipelines instead of run, as safe_run can catch certain implementation errors of the run method.

Parameters:

datapoint: An instance of a tpcp.Dataset containing only a single datapoint. The structure of the data will depend on the dataset.

Returns:

self: The class instance with all result attributes populated

safe_run(datapoint: DatasetT) → Self[source]#

Run the pipeline with some additional checks.

It is preferred to use this method over run, as it can catch some simple implementation errors of custom pipelines.

The following things are checked:

The run method must return self (or at least an instance of the pipeline)
The run method must set result attributes on the pipeline
All result attributes must have a trailing _ in their name
The run method must not modify the input parameters of the pipeline

Parameters:

datapoint: An instance of a tpcp.Dataset containing only a single datapoint. The structure of the data will depend on the dataset.

Returns:

self: The class instance with all result attributes populated

score(datapoint: DatasetT) → float | dict[str, float][source]#

Calculate performance of the pipeline on a datapoint with reference information.

This is an optional method and does not need to be implemented in many cases. Usually stand-a-lone functions are better suited as scorer.

A typical score method will call self.run(datapoint) and then compare the results with reference values also available on the dataset.

Parameters:

datapoint: An instance of a tpcp.Dataset containing only a single datapoint. The structure of the data and the available reference information will depend on the dataset.

Returns:

score: A float or dict of float quantifying the quality of the pipeline on the provided data. A higher score is always better.

set_params(**params: Any) → Self[source]#

Set the parameters of this Algorithm.

To set parameters of nested objects use nested_object_name__para_name=.

Examples using `tpcp.Pipeline`#

Grid Search optimal Algorithm Parameter

Optimizable Pipelines

GridSearchCV

Custom Optuna Optimizer

Build-in Optuna Optimizers

Composite-Algorithms and Pipelines

Optimization Info

Pipeline#

Examples using tpcp.Pipeline#

Examples using `tpcp.Pipeline`#