tpcp.optimize.optuna.CustomOptunaOptimize#

class tpcp.optimize.optuna.CustomOptunaOptimize(pipeline: tpcp._pipeline.PipelineT, study: optuna.study.study.Study, *, n_trials: Optional[int] = None, timeout: Optional[float] = None, callbacks: Optional[List[Callable[[optuna.study.study.Study, optuna.structs.FrozenTrial], None]]] = None, gc_after_trial: bool = False, show_progress_bar: bool = False, return_optimized: bool = True)[source]#

Base class for custom Optuna optimizer.

This provides a relatively simple tpcp compatible interface to Optuna. You basically need to subclass this class and implement the create_objective method to return the objective function you want to optimize. The only difference to a normal objective function in Optuna is, that your objective here should expect a pipeline and a dataset object as second and third argument (see Example). If there are parameters you want to make customizable (e.g. which metric to optimize for), expose them in the __init__ of your subclass.

Depending on your usecase, your custom optimizers can be single use with a bunch of “hard-coded” logic, or you can try to make them more general, by exposing certain configurability.

Parameters
pipeline

A tpcp pipeline with some hyper-parameters that should be optimized. This can either be a normal pipeline or an optimizable-pipeline. This fully depends on your implementation of the create_objective method.

study

The optuna Study that should be used for optimization.

n_trials

The number of trials. If this argument is set to None, there is no limitation on the number of trials. In this case you should use timeout instead. Because optuna is called internally by this wrapper, you can not setup a study without limits and end it using CTRL+C (as suggested by the Optuna docs). In this case the entire execution flow would be stopped.

timeout

Stop study after the given number of second(s). If this argument is set to None, the study is executed without time limitation. In this case you should use n_trials to limit the execution.

return_optimized

If True, a pipeline object with the overall best parameters is created and re-optimized using all provided data as input. The optimized pipeline object is stored as optimized_pipeline_. How the “re-optimization” works depends on the type of pipeline provided. If it is a simple pipeline, no specific re-optimization will be perfomed and optimized_pipeline_ will simply be an instance of the pipeline with the best parameters indentified in the search. When pipeline is a subclass of OptimizablePipeline, we attempt to call pipeline.self_optimize with the entire dataset provided to the optimize method. The result of this self-optimization will be set as optimized_pipeline. If this behaviour is undesired, you can overwrite the return_optimized_pipeline method in subclass.s

callbacks

List of callback functions that are invoked at the end of each trial. Each function must accept two parameters with the following types in this order: Study and FrozenTrial.

show_progress_bar

Flag to show progress bars or not.

gc_after_trial

Run the garbage collerctor after each trial. Check the optuna documentation for more detail

Other Parameters
dataset

The dataset instance passed to the optimize method

Attributes
search_results_

Detailed results of the study.

optimized_pipeline_

An instance of the input pipeline with the best parameter set. This is only available if return_optimized is not False.

best_params_

Parameters of the best trial in the Study.

best_score_

Best score reached in the study.

best_trial_

Best trial in the Study.

study_

The study object itself. This should usually be identical to self.study.

Notes

As this wrapper attempts to fully encapsule all Optuna calls to make it possible to be run seamlessly in a cross-validation (or similar), you can not start multiple optuna optimizations at the same time which is the preffered way of multi-processing for optuna. In result, you are limited to single-process operations. If you want to get “hacky” you can try the approach suggested here to create a study that uses joblib for internal multiprocessing.

Examples

>>> from tpcp.validate import Scorer
>>> from optuna import create_study
>>> from optuna import samplers
>>>
>>> class MyOptunaOptimizer(CustomOptunaOptimize):
...     def create_objective(self):
...         def objective(trial: Trial, pipeline: Pipeline, dataset: Dataset):
...             trial.suggest_float("my_pipeline_para", 0, 3)
...             mean_score = Scorer(lambda dp: pipeline.score(dp))
...             return mean_score
...         return objective
>>>
>>> study = create_study(sampler=samplers.RandomSampler())
>>> opti = MyOptunaOptimizer(pipeline=MyPipeline(), study=study, n_trials=10)
>>> opti = opti.optimize(MyDataset())

Methods

clone()

Create a new instance of the class with all parameters copied over.

create_objective()

Return the objective function that should be optimized.

get_params([deep])

Get parameters for this algorithm.

optimize(dataset, **_)

Optimize the objective over the dataset and find the best parameter combination.

return_optimized_pipeline(pipeline, dataset, ...)

Return the pipeline with the best parameters of a study.

run(datapoint)

Run the optimized pipeline.

safe_run(datapoint)

Run the optimized pipeline.

score(datapoint)

Run score of the optimized pipeline.

set_params(**params)

Set the parameters of this Algorithm.

__init__(pipeline: tpcp._pipeline.PipelineT, study: optuna.study.study.Study, *, n_trials: Optional[int] = None, timeout: Optional[float] = None, callbacks: Optional[List[Callable[[optuna.study.study.Study, optuna.structs.FrozenTrial], None]]] = None, gc_after_trial: bool = False, show_progress_bar: bool = False, return_optimized: bool = True) None[source]#
_call_optimize(study: optuna.study.study.Study, objective: Callable[[optuna.trial._trial.Trial], Union[float, Sequence[float]]]) optuna.study.study.Study[source]#

Call the optuna study.

This is a separate method to make it easy to modify how the study is called.

clone() typing_extensions.Self[source]#

Create a new instance of the class with all parameters copied over.

This will create a new instance of the class itself and all nested objects

create_objective() Callable[[optuna.trial._trial.Trial, tpcp._pipeline.PipelineT, tpcp._dataset.DatasetT], Union[float, Sequence[float]]][source]#

Return the objective function that should be optimized.

This method should be implemented by a child class and return an objective function that is compatible with Optuna. However, compared to a normal Optuna objective function, the function should expect a pipeline and a dataset object as additional inputs to the optimization Trial object.

get_params(deep: bool = True) Dict[str, Any][source]#

Get parameters for this algorithm.

Parameters
deep

Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like nested_object_name__ (Note the two “_” at the end)

Returns
params

Parameter names mapped to their values.

optimize(dataset: tpcp._dataset.DatasetT, **_: Any) typing_extensions.Self[source]#

Optimize the objective over the dataset and find the best parameter combination.

This method calls self.create_objective to obtain the objective function that should be optimized.

Parameters
dataset

The dataset used for optimization.

return_optimized_pipeline(pipeline: tpcp._pipeline.PipelineT, dataset: tpcp._dataset.DatasetT, study: optuna.study.study.Study) tpcp._pipeline.PipelineT[source]#

Return the pipeline with the best parameters of a study.

This either just returns the pipeline with the best parameters set, or if the pipeline is a subclass of OptimizablePipeline it attempts a re-optimization of the pipeline using the provided dataset.

This functionality is a sensible default, but it is expected to overwrite this method in custom subclasses, if specific behaviour is needed.

Don’t call this function on its own! It is only expected to be called internally by optimize.

run(datapoint: tpcp._dataset.DatasetT) tpcp._pipeline.PipelineT[source]#

Run the optimized pipeline.

This is a wrapper to contain API compatibility with Pipeline.

safe_run(datapoint: tpcp._dataset.DatasetT) tpcp._pipeline.PipelineT[source]#

Run the optimized pipeline.

This is a wrapper to contain API compatibility with Pipeline.

score(datapoint: tpcp._dataset.DatasetT) Union[float, Dict[str, float]][source]#

Run score of the optimized pipeline.

This is a wrapper to contain API compatibility with Pipeline.

set_params(**params: Any) typing_extensions.Self[source]#

Set the parameters of this Algorithm.

To set parameters of nested objects use nested_object_name__para_name=.

Examples using tpcp.optimize.optuna.CustomOptunaOptimize#