tpcp.optimize.optuna.OptunaSearch#

class tpcp.optimize.optuna.OptunaSearch(pipeline: PipelineT, get_study_params: Callable[[int], StudyParamsDict], create_search_space: Callable[[Trial], None], *, scoring: Callable[[PipelineT, DatasetT], T | Aggregator[Any] | Dict[str, T | Aggregator[Any]] | Dict[str, T | Aggregator[Any] | Dict[str, T | Aggregator[Any]]]] | Scorer[PipelineT, DatasetT, T | Aggregator[Any] | Dict[str, T | Aggregator[Any]]] | None, score_name: str | None = None, n_trials: int | None = None, timeout: float | None = None, callbacks: List[Callable[[Study, FrozenTrial], None]] | None = None, gc_after_trial: bool = False, n_jobs: int = 1, random_seed: int | None = None, eval_str_paras: Sequence[str] = (), show_progress_bar: bool = False, return_optimized: bool = True)[source]#

GridSearch equivalent using Optuna.

An opinionated parameter optimization for simple (i.e. non-optimizable) pipelines that can be used as a replacement to GridSearch.

Parameters:
pipeline

A tpcp pipeline with some hyper-parameters that should be optimized. This should be a normal (i.e. non-optimizable pipeline) when using this class.

get_study_params

A callable that returns a dictionary with the parameters that should be used to create the study (i.e. passed to optuna.create_study). Creating the study is handled via a callable, instead of providing the parameters or the study object directly, to make it possible to create individual studies, when CustomOptuna optimize is called by an external wrapper (i.e. cross_validate). Further, the provided fucntion is called with a single parameter seed that can be used to create a samplers and pruners with different random seeds. This is important for multi-processing (see more in Notes). Note, that this method should return consistent output when called multiple times with the same seed. Otherwise, unexcepected behaviour can occur, where different processes use different samplers/pruners in a multi-processing setting (n_jobs > 1).

create_search_space

A callable that takes a Trial object as input and calls suggest_* methods on it to define the search space.

scoring

A callable that can score a single data point given a pipeline. This function should return either a single score or a dictionary of scores. If scoring is None the default score method of the pipeline is used instead.

Note that if scoring returns a dictionary, score_name must be set to the name of the score that should be used for ranking.

score_name

The name of the score that should be used for ranking in case the scoring function returns a dictionary of values.

n_trials

The number of trials. If this argument is set to None, there is no limitation on the number of trials. In this case you should use timeout instead. Because optuna is called internally by this wrapper, you can not set up a study without limits and end it using CTRL+C (as suggested by the Optuna docs). In this case the entire execution flow would be stopped.

timeout

Stop study after the given number of second(s). If this argument is set to None, the study is executed without time limitation. In this case you should use n_trials to limit the execution.

return_optimized

If True, a pipeline object with the overall best parameters is created. The optimized pipeline object is stored as optimized_pipeline_.

callbacks

List of callback functions that are invoked at the end of each trial. Each function must accept two parameters with the following types in this order: Study and FrozenTrial.

n_jobs

Number of parallel jobs to use (default = 1 -> single process, -1 -> all available cores). This uses joblib with the multiprocessing backend to parallelize the optimization. If this is set to -1, all available cores are used.

Warning

Read the notes in CustomOptunaOptimize on multiprocessing below before using this feature.

random_seed

A random seed that is used as base for the seed passed to your implementation get_study_params. If None, this is set to a random integer between 0 and 100 (derived using numpy.random.randint). In case of multiprocessing, this seed is used as offset to create different seeds for each process.

eval_str_paras

This can be a sequence (tuple/list) of parameter names used by Optuna that should be evaluated using literal_eval instead of just set as string on the pipeline. The main usecase of this is to allow the user to pass a list of strings to suggest_categorical but have the actual pipeline recive the evaluated value of this string. This is required, as many storage backends of optuna only support number or strings as categorical values.

A typical example would be wanting to select a set of axis for an algorithm that are expressed as a list/tuple of strings. In this case you would use a strinigfied version of these tuples as the categorical values in the optuna study and then use eval_str_paras to evaluate the stringified version to the actual tuple.

>>> def search_space(trial):
...     trial.suggest_categorical("axis", ["('x',)", "('y',)", "('z',)", "('x', 'y')"])
>>> optuna_opt = OptunaSearch(pipeline, ..., eval_str_paras=["axis"])
show_progress_bar

Flag to show progress bars or not.

gc_after_trial

Run the garbage collector after each trial. Check the optuna documentation for more detail

Other Parameters:
dataset

The dataset instance passed to the optimize method

Attributes:
search_results_

Detailed results of the study.

optimized_pipeline_

An instance of the input pipeline with the best parameter set. This is only available if return_optimized is not False.

best_params_

Parameters of the best trial in the Study.

best_score_

Best score reached in the study.

best_trial_

Best trial in the Study.

study_

The study object itself. This should usually be identical to self.study.

multimetric_

If the scorer returned multiple scores

random_seed_

The actual random seed used for the optimization. This is either the value passed to random_seed or a random integer between 0 and 100.

Methods

clone()

Create a new instance of the class with all parameters copied over.

create_objective()

Create the objective function for optuna.

get_params([deep])

Get parameters for this algorithm.

optimize(dataset, **_)

Optimize the objective over the dataset and find the best parameter combination.

return_optimized_pipeline(pipeline, dataset, ...)

Return the pipeline with the best parameters of a study.

run(datapoint)

Run the optimized pipeline.

safe_run(datapoint)

Run the optimized pipeline.

sanitize_params(params)

Sanatize the parameters of a trial.

score(datapoint)

Run score of the optimized pipeline.

set_params(**params)

Set the parameters of this Algorithm.

__init__(pipeline: PipelineT, get_study_params: Callable[[int], StudyParamsDict], create_search_space: Callable[[Trial], None], *, scoring: Callable[[PipelineT, DatasetT], T | Aggregator[Any] | Dict[str, T | Aggregator[Any]] | Dict[str, T | Aggregator[Any] | Dict[str, T | Aggregator[Any]]]] | Scorer[PipelineT, DatasetT, T | Aggregator[Any] | Dict[str, T | Aggregator[Any]]] | None, score_name: str | None = None, n_trials: int | None = None, timeout: float | None = None, callbacks: List[Callable[[Study, FrozenTrial], None]] | None = None, gc_after_trial: bool = False, n_jobs: int = 1, random_seed: int | None = None, eval_str_paras: Sequence[str] = (), show_progress_bar: bool = False, return_optimized: bool = True) None[source]#
_call_optimize(study: Study, objective: Callable[[Trial], float | Sequence[float]])[source]#

Call the optuna study.

This is a separate method to make it easy to modify how the study is called.

property best_params_: Dict[str, Any]#

Parameters of the best trial in the Study.

property best_score_: float#

Best score reached in the study.

property best_trial_: FrozenTrial#

Best trial in the Study.

clone() Self[source]#

Create a new instance of the class with all parameters copied over.

This will create a new instance of the class itself and all nested objects

create_objective() Callable[[Trial, PipelineT, DatasetT], float | Sequence[float]][source]#

Create the objective function for optuna.

This is an internal function and should not be called directly.

get_params(deep: bool = True) Dict[str, Any][source]#

Get parameters for this algorithm.

Parameters:
deep

Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like nested_object_name__ (Note the two “_” at the end)

Returns:
params

Parameter names mapped to their values.

optimize(dataset: DatasetT, **_: Any) Self[source]#

Optimize the objective over the dataset and find the best parameter combination.

This method calls self.create_objective to obtain the objective function that should be optimized.

Parameters:
dataset

The dataset used for optimization.

return_optimized_pipeline(pipeline: PipelineT, dataset: DatasetT, study: Study) PipelineT[source]#

Return the pipeline with the best parameters of a study.

This is an internal function and should not be called directly.

run(datapoint: DatasetT) PipelineT[source]#

Run the optimized pipeline.

This is a wrapper to contain API compatibility with Pipeline.

safe_run(datapoint: DatasetT) PipelineT[source]#

Run the optimized pipeline.

This is a wrapper to contain API compatibility with Pipeline.

sanitize_params(params: Dict[str, Any]) Dict[str, Any][source]#

Sanatize the parameters of a trial.

This will apply the str evaluation controlled by self.eval_str_paras to the parameters. Call this method before passing the parameters to the pipeline in your objective function.

score(datapoint: DatasetT) float | Dict[str, float][source]#

Run score of the optimized pipeline.

This is a wrapper to contain API compatibility with Pipeline.

property search_results_: Dict[str, Sequence[Any]]#

Detailed results of the study.

This basically contains the same information as self.study_.trials_dataframe(), with some small modifications:

  • columns starting with “params_” are renamed to “param_”

  • a new column called “params” containing all parameters as dict is added

  • “value” is renamed to score”

  • the score of pruned trials is set to np.nan

These changes are made to make the output comparable to the output of GridSearch and GridSearchCV.

set_params(**params: Any) Self[source]#

Set the parameters of this Algorithm.

To set parameters of nested objects use nested_object_name__para_name=.

Examples using tpcp.optimize.optuna.OptunaSearch#

Custom Optuna Optimizer

Custom Optuna Optimizer

Build-in Optuna Optimizers

Build-in Optuna Optimizers