Pipeline#
- class tpcp.Pipeline[source]#
Baseclass for all custom pipelines.
To create your own custom pipeline, subclass this class and implement
run. Therunmethod is expected to operate on exactly one dataset datapoint/group.Methods
clone()Create a new instance of the class with all parameters copied over.
get_params([deep])Get parameters for this algorithm.
run(datapoint)Run the pipeline.
safe_run(datapoint)Run the pipeline with some additional checks.
set_params(**params)Set the parameters of this Algorithm.
- __init__(*args, **kwargs)#
- clone() Self[source]#
Create a new instance of the class with all parameters copied over.
This will create a new instance of the class itself and all nested objects
- get_params(deep: bool = True) dict[str, Any][source]#
Get parameters for this algorithm.
- Parameters:
- deep
Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like
nested_object_name__(Note the two “_” at the end)
- Returns:
- params
Parameter names mapped to their values.
- run(datapoint: DatasetT) Self[source]#
Run the pipeline.
Note
It is usually preferred to use
safe_runon custom pipelines instead ofrun, assafe_runcan catch certain implementation errors of the run method. However, neitherrunnorsafe_runverify thatdatapointactually represents only a single datapoint/group. Pipeline implementations should enforce this through dataset accessors and/or explicitassert_is_single(...)/assert_is_single_group(...)checks.- Parameters:
- datapoint
An instance of a
tpcp.Datasetcontaining only a single datapoint. The structure of the data will depend on the dataset.
- Returns:
- self
The class instance with all result attributes populated
- safe_run(datapoint: DatasetT) Self[source]#
Run the pipeline with some additional checks.
It is preferred to use this method over
run, as it can catch some simple implementation errors of custom pipelines. It does not validate that the provided dataset instance contains only a single datapoint/group.The following things are checked:
The run method must return
self(or at least an instance of the pipeline)The run method must set result attributes on the pipeline
All result attributes must have a trailing
_in their nameThe run method must not modify the input parameters of the pipeline
- Parameters:
- datapoint
An instance of a
tpcp.Datasetcontaining only a single datapoint. The structure of the data will depend on the dataset.
- Returns:
- self
The class instance with all result attributes populated