tpcp.optimize.optuna
.OptunaSearch#
- class tpcp.optimize.optuna.OptunaSearch(pipeline: PipelineT, get_study_params: Callable[[int], StudyParamsDict], create_search_space: Callable[[Trial], None], *, scoring: Callable[[PipelineT, DatasetT], T | Aggregator[Any] | Dict[str, T | Aggregator[Any]] | Dict[str, T | Aggregator[Any] | Dict[str, T | Aggregator[Any]]]] | Scorer[PipelineT, DatasetT, T | Aggregator[Any] | Dict[str, T | Aggregator[Any]]] | None, score_name: str | None = None, n_trials: int | None = None, timeout: float | None = None, callbacks: List[Callable[[Study, FrozenTrial], None]] | None = None, gc_after_trial: bool = False, n_jobs: int = 1, random_seed: int | None = None, eval_str_paras: Sequence[str] = (), show_progress_bar: bool = False, return_optimized: bool = True)[source]#
GridSearch equivalent using Optuna.
An opinionated parameter optimization for simple (i.e. non-optimizable) pipelines that can be used as a replacement to GridSearch.
- Parameters:
- pipeline
A tpcp pipeline with some hyper-parameters that should be optimized. This should be a normal (i.e. non-optimizable pipeline) when using this class.
- get_study_params
A callable that returns a dictionary with the parameters that should be used to create the study (i.e. passed to
optuna.create_study
). Creating the study is handled via a callable, instead of providing the parameters or the study object directly, to make it possible to create individual studies, when CustomOptuna optimize is called by an external wrapper (i.e.cross_validate
). Further, the provided fucntion is called with a single parameterseed
that can be used to create a samplers and pruners with different random seeds. This is important for multi-processing (see more in Notes). Note, that this method should return consistent output when called multiple times with the same seed. Otherwise, unexcepected behaviour can occur, where different processes use different samplers/pruners in a multi-processing setting (n_jobs > 1
).- create_search_space
A callable that takes a
Trial
object as input and callssuggest_*
methods on it to define the search space.- scoring
A callable that can score a single data point given a pipeline. This function should return either a single score or a dictionary of scores. If scoring is
None
the defaultscore
method of the pipeline is used instead.Note that if scoring returns a dictionary,
score_name
must be set to the name of the score that should be used for ranking.- score_name
The name of the score that should be used for ranking in case the scoring function returns a dictionary of values.
- n_trials
The number of trials. If this argument is set to
None
, there is no limitation on the number of trials. In this case you should usetimeout
instead. Because optuna is called internally by this wrapper, you can not set up a study without limits and end it using CTRL+C (as suggested by the Optuna docs). In this case the entire execution flow would be stopped.- timeout
Stop study after the given number of second(s). If this argument is set to
None
, the study is executed without time limitation. In this case you should usen_trials
to limit the execution.- return_optimized
If True, a pipeline object with the overall best parameters is created. The optimized pipeline object is stored as
optimized_pipeline_
.- callbacks
List of callback functions that are invoked at the end of each trial. Each function must accept two parameters with the following types in this order:
Study
andFrozenTrial
.- n_jobs
Number of parallel jobs to use (default = 1 -> single process, -1 -> all available cores). This uses joblib with the multiprocessing backend to parallelize the optimization. If this is set to -1, all available cores are used.
Warning
Read the notes in
CustomOptunaOptimize
on multiprocessing below before using this feature.- random_seed
A random seed that is used as base for the seed passed to your implementation
get_study_params
. If None, this is set to a random integer between 0 and 100 (derived using numpy.random.randint). In case of multiprocessing, this seed is used as offset to create different seeds for each process.- eval_str_paras
This can be a sequence (tuple/list) of parameter names used by Optuna that should be evaluated using
literal_eval
instead of just set as string on the pipeline. The main usecase of this is to allow the user to pass a list of strings tosuggest_categorical
but have the actual pipeline recive the evaluated value of this string. This is required, as many storage backends of optuna only support number or strings as categorical values.A typical example would be wanting to select a set of axis for an algorithm that are expressed as a list/tuple of strings. In this case you would use a strinigfied version of these tuples as the categorical values in the optuna study and then use
eval_str_paras
to evaluate the stringified version to the actual tuple.>>> def search_space(trial): ... trial.suggest_categorical("axis", ["('x',)", "('y',)", "('z',)", "('x', 'y')"]) >>> optuna_opt = OptunaSearch(pipeline, ..., eval_str_paras=["axis"])
- show_progress_bar
Flag to show progress bars or not.
- gc_after_trial
Run the garbage collector after each trial. Check the optuna documentation for more detail
- Other Parameters:
- dataset
The dataset instance passed to the optimize method
- Attributes:
search_results_
Detailed results of the study.
- optimized_pipeline_
An instance of the input pipeline with the best parameter set. This is only available if
return_optimized
is not False.best_params_
Parameters of the best trial in the
Study
.best_score_
Best score reached in the study.
best_trial_
Best trial in the
Study
.- study_
The study object itself. This should usually be identical to
self.study
.- multimetric_
If the scorer returned multiple scores
- random_seed_
The actual random seed used for the optimization. This is either the value passed to
random_seed
or a random integer between 0 and 100.
Methods
clone
()Create a new instance of the class with all parameters copied over.
Create the objective function for optuna.
get_params
([deep])Get parameters for this algorithm.
optimize
(dataset, **_)Optimize the objective over the dataset and find the best parameter combination.
return_optimized_pipeline
(pipeline, dataset, ...)Return the pipeline with the best parameters of a study.
run
(datapoint)Run the optimized pipeline.
safe_run
(datapoint)Run the optimized pipeline.
sanitize_params
(params)Sanatize the parameters of a trial.
score
(datapoint)Run score of the optimized pipeline.
set_params
(**params)Set the parameters of this Algorithm.
- __init__(pipeline: PipelineT, get_study_params: Callable[[int], StudyParamsDict], create_search_space: Callable[[Trial], None], *, scoring: Callable[[PipelineT, DatasetT], T | Aggregator[Any] | Dict[str, T | Aggregator[Any]] | Dict[str, T | Aggregator[Any] | Dict[str, T | Aggregator[Any]]]] | Scorer[PipelineT, DatasetT, T | Aggregator[Any] | Dict[str, T | Aggregator[Any]]] | None, score_name: str | None = None, n_trials: int | None = None, timeout: float | None = None, callbacks: List[Callable[[Study, FrozenTrial], None]] | None = None, gc_after_trial: bool = False, n_jobs: int = 1, random_seed: int | None = None, eval_str_paras: Sequence[str] = (), show_progress_bar: bool = False, return_optimized: bool = True) None [source]#
- _call_optimize(study: Study, objective: Callable[[Trial], float | Sequence[float]])[source]#
Call the optuna study.
This is a separate method to make it easy to modify how the study is called.
- property best_trial_: FrozenTrial#
Best trial in the
Study
.
- clone() Self [source]#
Create a new instance of the class with all parameters copied over.
This will create a new instance of the class itself and all nested objects
- create_objective() Callable[[Trial, PipelineT, DatasetT], float | Sequence[float]] [source]#
Create the objective function for optuna.
This is an internal function and should not be called directly.
- get_params(deep: bool = True) Dict[str, Any] [source]#
Get parameters for this algorithm.
- Parameters:
- deep
Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like
nested_object_name__
(Note the two “_” at the end)
- Returns:
- params
Parameter names mapped to their values.
- optimize(dataset: DatasetT, **_: Any) Self [source]#
Optimize the objective over the dataset and find the best parameter combination.
This method calls
self.create_objective
to obtain the objective function that should be optimized.- Parameters:
- dataset
The dataset used for optimization.
- return_optimized_pipeline(pipeline: PipelineT, dataset: DatasetT, study: Study) PipelineT [source]#
Return the pipeline with the best parameters of a study.
This is an internal function and should not be called directly.
- run(datapoint: DatasetT) PipelineT [source]#
Run the optimized pipeline.
This is a wrapper to contain API compatibility with
Pipeline
.
- safe_run(datapoint: DatasetT) PipelineT [source]#
Run the optimized pipeline.
This is a wrapper to contain API compatibility with
Pipeline
.
- sanitize_params(params: Dict[str, Any]) Dict[str, Any] [source]#
Sanatize the parameters of a trial.
This will apply the str evaluation controlled by
self.eval_str_paras
to the parameters. Call this method before passing the parameters to the pipeline in your objective function.
- score(datapoint: DatasetT) float | Dict[str, float] [source]#
Run score of the optimized pipeline.
This is a wrapper to contain API compatibility with
Pipeline
.
- property search_results_: Dict[str, Sequence[Any]]#
Detailed results of the study.
This basically contains the same information as
self.study_.trials_dataframe()
, with some small modifications:columns starting with “params_” are renamed to “param_”
a new column called “params” containing all parameters as dict is added
“value” is renamed to score”
the score of pruned trials is set to
np.nan
These changes are made to make the output comparable to the output of
GridSearch
andGridSearchCV
.