tpcp
.Pipeline#
- class tpcp.Pipeline[source]#
Baseclass for all custom pipelines.
To create your own custom pipeline, subclass this class and implement
run
.Methods
clone
()Create a new instance of the class with all parameters copied over.
get_params
([deep])Get parameters for this algorithm.
run
(datapoint)Run the pipeline.
safe_run
(datapoint)Run the pipeline with some additional checks.
score
(datapoint)Calculate performance of the pipeline on a datapoint with reference information.
set_params
(**params)Set the parameters of this Algorithm.
- __init__(*args, **kwargs)#
- clone() Self [source]#
Create a new instance of the class with all parameters copied over.
This will create a new instance of the class itself and all nested objects
- get_params(deep: bool = True) Dict[str, Any] [source]#
Get parameters for this algorithm.
- Parameters:
- deep
Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like
nested_object_name__
(Note the two “_” at the end)
- Returns:
- params
Parameter names mapped to their values.
- run(datapoint: DatasetT) Self [source]#
Run the pipeline.
Note
It is usually preferred to use
safe_run
on custom pipelines instead ofrun
, assafe_run
can catch certain implementation errors of the run method.- Parameters:
- datapoint
An instance of a
tpcp.Dataset
containing only a single datapoint. The structure of the data will depend on the dataset.
- Returns:
- self
The class instance with all result attributes populated
- safe_run(datapoint: DatasetT) Self [source]#
Run the pipeline with some additional checks.
It is preferred to use this method over
run
, as it can catch some simple implementation errors of custom pipelines.The following things are checked:
The run method must return
self
(or at least an instance of the pipeline)The run method must set result attributes on the pipeline
All result attributes must have a trailing
_
in their nameThe run method must not modify the input parameters of the pipeline
- Parameters:
- datapoint
An instance of a
tpcp.Dataset
containing only a single datapoint. The structure of the data will depend on the dataset.
- Returns:
- self
The class instance with all result attributes populated
- score(datapoint: DatasetT) float | Dict[str, float] [source]#
Calculate performance of the pipeline on a datapoint with reference information.
This is an optional method and does not need to be implemented in many cases. Usually stand-a-lone functions are better suited as scorer.
A typical score method will call
self.run(datapoint)
and then compare the results with reference values also available on the dataset.- Parameters:
- datapoint
An instance of a
tpcp.Dataset
containing only a single datapoint. The structure of the data and the available reference information will depend on the dataset.
- Returns:
- score
A float or dict of float quantifying the quality of the pipeline on the provided data. A higher score is always better.
Examples using tpcp.Pipeline
#
Grid Search optimal Algorithm Parameter
Composite-Algorithms and Pipelines