Note
Go to the end to download the full example code
Composite-Algorithms and Pipelines#
Sometimes a pipeline or algorithms requires a list of parameters or nested objects. As we can not support parameters which names are not known when the class is defined, such cases need to be handled via composite fields.
A composite field is a parameter expecting a value of the shape [(name1, sub_para1), (name2, sub_para2), ...]
.
The sub-paras can themselves be tpcp objects.
As it is difficult at runtime to know, if a parameter is expected to be a composite field, you need to actively
specify all fields that should be considered composite fields during class definition using the _composite_params
attribute:
import dataclasses
import traceback
from typing import Optional
from tpcp import Pipeline
from tpcp.exceptions import ValidationError
@dataclasses.dataclass
class Workflow(Pipeline):
_composite_params = ("pipelines",)
pipelines: Optional[list[tuple[str, Pipeline]]] = None
def __init__(self, pipelines=None):
self.pipelines = pipelines
That’s it!
Now tpcp knows, that pipelines
should be a composite field and will actually complain, if we try to assign
something invalid.
Composite fields are allowed to either have the value None, or be a list of tuples as explained above
instance = Workflow()
instance.pipelines # Our default value of None
instance.pipelines = "something invalid"
try:
print(instance.get_params())
except ValidationError:
traceback.print_exc()
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/tpcp/checkouts/latest/examples/recipies/_03_composite_objects.py", line 47, in <module>
print(instance.get_params())
File "/home/docs/checkouts/readthedocs.org/user_builds/tpcp/checkouts/latest/tpcp/_base.py", line 366, in get_params
return _get_params(self, deep)
File "/home/docs/checkouts/readthedocs.org/user_builds/tpcp/checkouts/latest/tpcp/_base.py", line 496, in _get_params
_assert_is_allowed_composite_value(v, key, i)
File "/home/docs/checkouts/readthedocs.org/user_builds/tpcp/checkouts/latest/tpcp/_base.py", line 452, in _assert_is_allowed_composite_value
raise ValidationError(
tpcp.exceptions.ValidationError: The provided parameters for the composite field pipelines does not seem to be the right type. It should be a sequence of `(name, value)` tuples, but the obj at position 0 in the sequence was not a tuple but:
`s`
While you could set the individual sub-params in a composite field to whatever you want, the real value of explicit composite fields are the use of tpcp-objects
@dataclasses.dataclass
class MyPipeline(Pipeline):
param: float = 4
param2: int = 10
workflow_instance = Workflow(pipelines=[("pipe1", MyPipeline()), ("pipe2", MyPipeline(param2=5))])
We can now use get_params
to get a deep inspection of the nested objects:
workflow_instance.get_params(deep=True)
{'pipelines__pipe1': MyPipeline(param=4, param2=10), 'pipelines__pipe1__param': 4, 'pipelines__pipe1__param2': 10, 'pipelines__pipe2': MyPipeline(param=4, param2=5), 'pipelines__pipe2__param': 4, 'pipelines__pipe2__param2': 5, 'pipelines': [('pipe1', MyPipeline(param=4, param2=10)), ('pipe2', MyPipeline(param=4, param2=5))]}
Or we can set params using the following syntax:
workflow_instance = workflow_instance.set_params(pipelines__pipe1__param=2, pipelines__pipe2=MyPipeline(param2=4))
workflow_instance.get_params(deep=True)
{'pipelines__pipe1': MyPipeline(param=2, param2=10), 'pipelines__pipe1__param': 2, 'pipelines__pipe1__param2': 10, 'pipelines__pipe2': MyPipeline(param=4, param2=4), 'pipelines__pipe2__param': 4, 'pipelines__pipe2__param2': 4, 'pipelines': [('pipe1', MyPipeline(param=2, param2=10)), ('pipe2', MyPipeline(param=4, param2=4))]}
Note that it is not possible to set parameters for keys that don’t exist yet! In such a case, you would manually recreate the full list.
Total running time of the script: (0 minutes 1.593 seconds)
Estimated memory usage: 9 MB