tpcp.optimize.optuna
.OptunaSearch#
- class tpcp.optimize.optuna.OptunaSearch(pipeline: PipelineT, create_study: Callable[[], Study], create_search_space: Callable[[Trial], None], *, scoring: Optional[Union[Callable[[PipelineT, DatasetT], Union[T, Aggregator[Any], Dict[str, Union[T, Aggregator[Any]]], Dict[str, Union[T, Aggregator[Any], Dict[str, Union[T, Aggregator[Any]]]]]]], Scorer[PipelineT, DatasetT, Union[T, Aggregator[Any], Dict[str, Union[T, Aggregator[Any]]]]]]], score_name: Optional[str] = None, n_trials: Optional[int] = None, timeout: Optional[float] = None, callbacks: Optional[List[Callable[[Study, FrozenTrial], None]]] = None, gc_after_trial: bool = False, n_jobs: int = 1, show_progress_bar: bool = False, return_optimized: bool = True)[source]#
GridSearch equivalent using Optuna.
An opinionated parameter optimization for simple (i.e. non-optimizable) pipelines that can be used as a replacement to GridSearch.
- Parameters:
- pipeline
A tpcp pipeline with some hyper-parameters that should be optimized. This should be a normal (i.e. non-optimizable pipeline) when using this class.
- create_study
A callable that returns an optuna study instance to be used for the optimization. It will be called as part of the
optimize
method without parameters. The resulting study object can be accessed viaself.study_
after the optimization is finished. Creating the study is handled via a callable, instead of providing the study object itself, to make it possible to create individual studies, when CustomOptuna optimize is called by an external wrapper (i.e.cross_validate
).- create_search_space
A callable that takes a
Trial
object as input and callssuggest_*
methods on it to define the search space.- scoring
A callable that can score a single data point given a pipeline. This function should return either a single score or a dictionary of scores. If scoring is
None
the defaultscore
method of the pipeline is used instead.Note that if scoring returns a dictionary,
score_name
must be set to the name of the score that should be used for ranking.- score_name
The name of the score that should be used for ranking in case the scoring function returns a dictionary of values.
- n_trials
The number of trials. If this argument is set to
None
, there is no limitation on the number of trials. In this case you should usetimeout
instead. Because optuna is called internally by this wrapper, you can not set up a study without limits and end it using CTRL+C (as suggested by the Optuna docs). In this case the entire execution flow would be stopped.- timeout
Stop study after the given number of second(s). If this argument is set to
None
, the study is executed without time limitation. In this case you should usen_trials
to limit the execution.- return_optimized
If True, a pipeline object with the overall best parameters is created. The optimized pipeline object is stored as
optimized_pipeline_
.- callbacks
List of callback functions that are invoked at the end of each trial. Each function must accept two parameters with the following types in this order:
Study
andFrozenTrial
.- n_jobs
Number of parallel jobs to use (default = 1 -> single process, -1 -> all available cores). This uses joblib with the multiprocessing backend to parallelize the optimization. If this is set to -1, all available cores are used.
Warning
Read the notes in
CustomOptunaOptimize
on multiprocessing below before using this feature.- show_progress_bar
Flag to show progress bars or not.
- gc_after_trial
Run the garbage collector after each trial. Check the optuna documentation for more detail
- Other Parameters:
- dataset
The dataset instance passed to the optimize method
- Attributes:
search_results_
Detailed results of the study.
- optimized_pipeline_
An instance of the input pipeline with the best parameter set. This is only available if
return_optimized
is not False.best_params_
Parameters of the best trial in the
Study
.best_score_
Best score reached in the study.
best_trial_
Best trial in the
Study
.- study_
The study object itself. This should usually be identical to
self.study
.- multimetric_
If the scorer returned multiple scores
Methods
clone
()Create a new instance of the class with all parameters copied over.
Create the objective function for optuna.
get_params
([deep])Get parameters for this algorithm.
optimize
(dataset, **_)Optimize the objective over the dataset and find the best parameter combination.
return_optimized_pipeline
(pipeline, dataset, ...)Return the pipeline with the best parameters of a study.
run
(datapoint)Run the optimized pipeline.
safe_run
(datapoint)Run the optimized pipeline.
score
(datapoint)Run score of the optimized pipeline.
set_params
(**params)Set the parameters of this Algorithm.
- __init__(pipeline: PipelineT, create_study: Callable[[], Study], create_search_space: Callable[[Trial], None], *, scoring: Optional[Union[Callable[[PipelineT, DatasetT], Union[T, Aggregator[Any], Dict[str, Union[T, Aggregator[Any]]], Dict[str, Union[T, Aggregator[Any], Dict[str, Union[T, Aggregator[Any]]]]]]], Scorer[PipelineT, DatasetT, Union[T, Aggregator[Any], Dict[str, Union[T, Aggregator[Any]]]]]]], score_name: Optional[str] = None, n_trials: Optional[int] = None, timeout: Optional[float] = None, callbacks: Optional[List[Callable[[Study, FrozenTrial], None]]] = None, gc_after_trial: bool = False, n_jobs: int = 1, show_progress_bar: bool = False, return_optimized: bool = True)[source]#
- _call_optimize(study: Study, objective: Callable[[Trial], Union[float, Sequence[float]]])[source]#
Call the optuna study.
This is a separate method to make it easy to modify how the study is called.
- clone() Self [source]#
Create a new instance of the class with all parameters copied over.
This will create a new instance of the class itself and all nested objects
- create_objective() Callable[[Trial, PipelineT, DatasetT], Union[float, Sequence[float]]] [source]#
Create the objective function for optuna.
This is an internal function and should not be called directly.
- get_params(deep: bool = True) Dict[str, Any] [source]#
Get parameters for this algorithm.
- Parameters:
- deep
Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like
nested_object_name__
(Note the two “_” at the end)
- Returns:
- params
Parameter names mapped to their values.
- optimize(dataset: DatasetT, **_: Any) Self [source]#
Optimize the objective over the dataset and find the best parameter combination.
This method calls
self.create_objective
to obtain the objective function that should be optimized.- Parameters:
- dataset
The dataset used for optimization.
- return_optimized_pipeline(pipeline: PipelineT, dataset: DatasetT, study: Study) PipelineT [source]#
Return the pipeline with the best parameters of a study.
This is an internal function and should not be called directly.
- run(datapoint: DatasetT) PipelineT [source]#
Run the optimized pipeline.
This is a wrapper to contain API compatibility with
Pipeline
.
- safe_run(datapoint: DatasetT) PipelineT [source]#
Run the optimized pipeline.
This is a wrapper to contain API compatibility with
Pipeline
.