OptunaSearch#
- class tpcp.optimize.optuna.OptunaSearch(pipeline: PipelineT, get_study_params: Callable[[int], StudyParamsDict], create_search_space: Callable[[Trial], None], *, scoring: Callable[[PipelineT, DatasetT], float | Aggregator[Any] | dict[str, float | Aggregator[Any]]] | Scorer[PipelineT, DatasetT], score_name: str | None = None, n_trials: int | None = None, timeout: float | None = None, callbacks: list[Callable[[Study, FrozenTrial], None]] | None = None, gc_after_trial: bool = False, n_jobs: int = 1, random_seed: int | None = None, eval_str_paras: Sequence[str] = (), show_progress_bar: bool = False, return_optimized: bool = True)[source]#
GridSearch equivalent using Optuna.
An opinionated parameter optimization for simple (i.e. non-optimizable) pipelines that can be used as a replacement to GridSearch.
- Parameters:
- pipeline
A tpcp pipeline with some hyper-parameters that should be optimized. This should be a normal (i.e. non-optimizable pipeline) when using this class.
- get_study_params
A callable that returns a dictionary with the parameters that should be used to create the study (i.e. passed to
optuna.create_study). Creating the study is handled via a callable, instead of providing the parameters or the study object directly, to make it possible to create individual studies, when CustomOptuna optimize is called by an external wrapper (i.e.cross_validate). Further, the provided fucntion is called with a single parameterseedthat can be used to create a samplers and pruners with different random seeds. This is important for multi-processing (see more in Notes). Note, that this method should return consistent output when called multiple times with the same seed. Otherwise, unexcepected behaviour can occur, where different processes use different samplers/pruners in a multi-processing setting (n_jobs > 1).- create_search_space
A callable that takes a
Trialobject as input and callssuggest_*methods on it to define the search space.- scoring
A callable that can score a single data point given a pipeline. This function should return either a single score or a dictionary of scores. If scoring is
Nonethe defaultscoremethod of the pipeline is used instead.Note that if scoring returns a dictionary,
score_namemust be set to the name of the score that should be used for ranking.- score_name
The name of the score that should be used for ranking in case the scoring function returns a dictionary of values.
- n_trials
The number of trials. If this argument is set to
None, there is no limitation on the number of trials. In this case you should usetimeoutinstead. Because optuna is called internally by this wrapper, you can not set up a study without limits and end it using CTRL+C (as suggested by the Optuna docs). In this case the entire execution flow would be stopped.- timeout
Stop study after the given number of second(s). If this argument is set to
None, the study is executed without time limitation. In this case you should usen_trialsto limit the execution.- return_optimized
If True, a pipeline object with the overall best parameters is created. The optimized pipeline object is stored as
optimized_pipeline_.- callbacks
List of callback functions that are invoked at the end of each trial. Each function must accept two parameters with the following types in this order:
StudyandFrozenTrial.- n_jobs
Number of parallel jobs to use (default = 1 -> single process, -1 -> all available cores). This uses joblib with the multiprocessing backend to parallelize the optimization. If this is set to -1, all available cores are used.
Warning
Read the notes in
CustomOptunaOptimizeon multiprocessing below before using this feature.- random_seed
A random seed that is used as base for the seed passed to your implementation
get_study_params. If None, this is set to a random integer between 0 and 100 (derived using numpy.random.randint). In case of multiprocessing, this seed is used as offset to create different seeds for each process.- eval_str_paras
This can be a sequence (tuple/list) of parameter names used by Optuna that should be evaluated using
literal_evalinstead of just set as string on the pipeline. The main usecase of this is to allow the user to pass a list of strings tosuggest_categoricalbut have the actual pipeline recive the evaluated value of this string. This is required, as many storage backends of optuna only support number or strings as categorical values.A typical example would be wanting to select a set of axis for an algorithm that are expressed as a list/tuple of strings. In this case you would use a strinigfied version of these tuples as the categorical values in the optuna study and then use
eval_str_parasto evaluate the stringified version to the actual tuple.>>> def search_space(trial): ... trial.suggest_categorical( ... "axis", ["('x',)", "('y',)", "('z',)", "('x', 'y')"] ... ) >>> optuna_opt = OptunaSearch(pipeline, ..., eval_str_paras=["axis"])
- show_progress_bar
Flag to show progress bars or not.
- gc_after_trial
Run the garbage collector after each trial. Check the optuna documentation for more detail
- Other Parameters:
- dataset
The dataset instance passed to the optimize method
- Attributes:
search_results_Detailed results of the study.
- optimized_pipeline_
An instance of the input pipeline with the best parameter set. This is only available if
return_optimizedis not False.best_params_Parameters of the best trial in the
Study.best_score_Best score reached in the study.
best_trial_Best trial in the
Study.- study_
The study object itself. This should usually be identical to
self.study.- multimetric_
If the scorer returned multiple scores
- random_seed_
The actual random seed used for the optimization. This is either the value passed to
random_seedor a random integer between 0 and 100.
Methods
clone()Create a new instance of the class with all parameters copied over.
Create the objective function for optuna.
get_params([deep])Get parameters for this algorithm.
optimize(dataset, **_)Optimize the objective over the dataset and find the best parameter combination.
return_optimized_pipeline(pipeline, dataset, ...)Return the pipeline with the best parameters of a study.
run(datapoint)Run the optimized pipeline.
safe_run(datapoint)Run the optimized pipeline.
sanitize_params(params)Sanatize the parameters of a trial.
set_params(**params)Set the parameters of this Algorithm.
- __init__(pipeline: PipelineT, get_study_params: Callable[[int], StudyParamsDict], create_search_space: Callable[[Trial], None], *, scoring: Callable[[PipelineT, DatasetT], float | Aggregator[Any] | dict[str, float | Aggregator[Any]]] | Scorer[PipelineT, DatasetT], score_name: str | None = None, n_trials: int | None = None, timeout: float | None = None, callbacks: list[Callable[[Study, FrozenTrial], None]] | None = None, gc_after_trial: bool = False, n_jobs: int = 1, random_seed: int | None = None, eval_str_paras: Sequence[str] = (), show_progress_bar: bool = False, return_optimized: bool = True) None[source]#
- _call_optimize(study: Study, objective: Callable[[Trial], float | Sequence[float]])[source]#
Call the optuna study.
This is a separate method to make it easy to modify how the study is called.
- property best_trial_: FrozenTrial#
Best trial in the
Study.
- clone() Self[source]#
Create a new instance of the class with all parameters copied over.
This will create a new instance of the class itself and all nested objects
- create_objective() Callable[[Trial, PipelineT, DatasetT], float | Sequence[float]][source]#
Create the objective function for optuna.
This is an internal function and should not be called directly.
- get_params(deep: bool = True) dict[str, Any][source]#
Get parameters for this algorithm.
- Parameters:
- deep
Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like
nested_object_name__(Note the two “_” at the end)
- Returns:
- params
Parameter names mapped to their values.
- optimize(dataset: DatasetT, **_: Any) Self[source]#
Optimize the objective over the dataset and find the best parameter combination.
This method calls
self.create_objectiveto obtain the objective function that should be optimized.- Parameters:
- dataset
The dataset used for optimization.
- return_optimized_pipeline(pipeline: PipelineT, dataset: DatasetT, study: Study) PipelineT[source]#
Return the pipeline with the best parameters of a study.
This is an internal function and should not be called directly.
- run(datapoint: DatasetT) PipelineT[source]#
Run the optimized pipeline.
This is a wrapper to contain API compatibility with
Pipeline.
- safe_run(datapoint: DatasetT) PipelineT[source]#
Run the optimized pipeline.
This is a wrapper to contain API compatibility with
Pipeline.
- sanitize_params(params: dict[str, Any]) dict[str, Any][source]#
Sanatize the parameters of a trial.
This will apply the str evaluation controlled by
self.eval_str_parasto the parameters. Call this method before passing the parameters to the pipeline in your objective function.
- property search_results_: dict[str, Sequence[Any]]#
Detailed results of the study.
This basically contains the same information as
self.study_.trials_dataframe(), with some small modifications:columns starting with “params_” are renamed to “param_”
a new column called “params” containing all parameters as dict is added
“value” is renamed to score”
the score of pruned trials is set to
np.nan
These changes are made to make the output comparable to the output of
GridSearchandGridSearchCV.