Build-in Optuna Optimizers#

The custom optuna example shows how to implement a specific optuna optimizer with full control over all aspects. This is still the recommended way to do things, as you often will have specific requirements for your objective function.

However, there are still a number of problems that can be solved by a relative generic GridSearch or GridSearchCV. Therefore, we provide Optuna equivalents for these usecases to make use of the advanced samplers optuna provides.

Note

We still recommend to read through the custom optuna example before using the specific implementations demonstrated here.

OptunaSearch - GridSearch on Steroids#

The OptunaSearch class can be used in all cases where you would use GridSearch. The following is equivalent to the GridSearch example (Grid Search optimal Algorithm Parameter).

from pathlib import Path

import pandas as pd

from examples.algorithms.algorithms_qrs_detection_final import QRSDetector
from examples.datasets.datasets_final_ecg import ECGExampleData
from tpcp import Parameter, Pipeline, cf

try:
    HERE = Path(__file__).parent
except NameError:
    HERE = Path().resolve()
data_path = HERE.parent.parent / "example_data/ecg_mit_bih_arrhythmia/data"
example_data = ECGExampleData(data_path)


class MyPipeline(Pipeline[ECGExampleData]):
    algorithm: Parameter[QRSDetector]

    r_peak_positions_: pd.Series

    def __init__(self, algorithm: QRSDetector = cf(QRSDetector())):
        self.algorithm = algorithm

    def run(self, datapoint: ECGExampleData):
        # Note: We need to clone the algorithm instance, to make sure we don't leak any data between runs.
        algo = self.algorithm.clone()
        algo.detect(datapoint.data["ecg"], datapoint.sampling_rate_hz)

        self.r_peak_positions_ = algo.r_peak_positions_
        return self


pipe = MyPipeline()

Optuna Study#

To use optuna we need to create an optuna study, or rather a function that returns one, that can be used by OptunaSearch to create it. We will set this up identical to the custom optuna example.

Note

We use a in-memory study here, if you want to use multiprocessing or ensure that your search can be continued, use a different study backend.

from optuna import Trial, samplers


def get_study_params(seed):
    # We use a simple RandomSampler, but every optuna sampler will work
    sampler = samplers.RandomSampler(seed=seed)
    return {"direction": "maximize", "sampler": sampler}

Search Space#

In contrast to GridSearch where we define a fix parameter grid, in optuna we define a search space. Which value sin this search space will actually be evaluated depends on the chosen sampler. This also needs to be a function that takes the current trial object as input.

def create_search_space(trial: Trial):
    trial.suggest_float("algorithm__min_r_peak_height_over_baseline", 0.1, 2, step=0.1)
    trial.suggest_float("algorithm__high_pass_filter_cutoff_hz", 0.1, 2, step=0.1)

Score#

We use the same scoring function as in the GridSearch example:

from examples.algorithms.algorithms_qrs_detection_final import match_events_with_reference, precision_recall_f1_score


def score(pipeline: MyPipeline, datapoint: ECGExampleData):
    # We use the `safe_run` wrapper instead of just run. This is always a good idea.
    # We don't need to clone the pipeline here, as OptunaSearch will already clone the pipeline internally.
    pipeline = pipeline.safe_run(datapoint)
    tolerance_s = 0.02  # We just use 20 ms for this example
    matches = match_events_with_reference(
        pipeline.r_peak_positions_.to_numpy(),
        datapoint.r_peak_positions_.to_numpy(),
        tolerance=tolerance_s * datapoint.sampling_rate_hz,
    )
    precision, recall, f1_score = precision_recall_f1_score(matches)
    return {"precision": precision, "recall": recall, "f1_score": f1_score}

Running the search#

Now we can run the search. Note, that because our scoring function returns a dictionary, we need to specify the key we want to optimize by passing it to score_name. In this case, we want to maximize the f1 score.

from tpcp.optimize.optuna import OptunaSearch

opti = OptunaSearch(
    pipe, get_study_params, create_search_space, scoring=score, n_trials=10, score_name="f1_score", random_seed=42
)
opti = opti.optimize(example_data)

Datapoints:   0%|          | 0/12 [00:00<?, ?it/s]
Datapoints:  17%|█▋        | 2/12 [00:00<00:00, 19.17it/s]
Datapoints:  42%|████▏     | 5/12 [00:00<00:00, 20.14it/s]
Datapoints:  67%|██████▋   | 8/12 [00:00<00:00, 18.98it/s]
Datapoints:  83%|████████▎ | 10/12 [00:00<00:00, 17.40it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 16.36it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 17.36it/s]

Datapoints:   0%|          | 0/12 [00:00<?, ?it/s]
Datapoints:  17%|█▋        | 2/12 [00:00<00:00, 15.22it/s]
Datapoints:  33%|███▎      | 4/12 [00:00<00:00, 14.94it/s]
Datapoints:  50%|█████     | 6/12 [00:00<00:00, 14.88it/s]
Datapoints:  67%|██████▋   | 8/12 [00:00<00:00, 15.12it/s]
Datapoints:  83%|████████▎ | 10/12 [00:00<00:00, 15.29it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.30it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.19it/s]

Datapoints:   0%|          | 0/12 [00:00<?, ?it/s]
Datapoints:  17%|█▋        | 2/12 [00:00<00:00, 15.53it/s]
Datapoints:  33%|███▎      | 4/12 [00:00<00:00, 15.47it/s]
Datapoints:  50%|█████     | 6/12 [00:00<00:00, 15.07it/s]
Datapoints:  67%|██████▋   | 8/12 [00:00<00:00, 15.24it/s]
Datapoints:  83%|████████▎ | 10/12 [00:00<00:00, 15.32it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.33it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.30it/s]

Datapoints:   0%|          | 0/12 [00:00<?, ?it/s]
Datapoints:  17%|█▋        | 2/12 [00:00<00:00, 15.37it/s]
Datapoints:  33%|███▎      | 4/12 [00:00<00:00, 14.95it/s]
Datapoints:  50%|█████     | 6/12 [00:00<00:00, 14.81it/s]
Datapoints:  67%|██████▋   | 8/12 [00:00<00:00, 14.92it/s]
Datapoints:  83%|████████▎ | 10/12 [00:00<00:00, 14.78it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 14.75it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 14.82it/s]

Datapoints:   0%|          | 0/12 [00:00<?, ?it/s]
Datapoints:  17%|█▋        | 2/12 [00:00<00:00, 15.88it/s]
Datapoints:  33%|███▎      | 4/12 [00:00<00:00, 15.07it/s]
Datapoints:  50%|█████     | 6/12 [00:00<00:00, 15.27it/s]
Datapoints:  67%|██████▋   | 8/12 [00:00<00:00, 15.39it/s]
Datapoints:  83%|████████▎ | 10/12 [00:00<00:00, 15.55it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.59it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.49it/s]

Datapoints:   0%|          | 0/12 [00:00<?, ?it/s]
Datapoints:  17%|█▋        | 2/12 [00:00<00:00, 14.78it/s]
Datapoints:  33%|███▎      | 4/12 [00:00<00:00, 14.59it/s]
Datapoints:  50%|█████     | 6/12 [00:00<00:00, 14.77it/s]
Datapoints:  67%|██████▋   | 8/12 [00:00<00:00, 14.94it/s]
Datapoints:  83%|████████▎ | 10/12 [00:00<00:00, 15.10it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.09it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 14.97it/s]

Datapoints:   0%|          | 0/12 [00:00<?, ?it/s]
Datapoints:  17%|█▋        | 2/12 [00:00<00:00, 15.61it/s]
Datapoints:  33%|███▎      | 4/12 [00:00<00:00, 15.80it/s]
Datapoints:  50%|█████     | 6/12 [00:00<00:00, 15.69it/s]
Datapoints:  67%|██████▋   | 8/12 [00:00<00:00, 15.64it/s]
Datapoints:  83%|████████▎ | 10/12 [00:00<00:00, 15.66it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.56it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.62it/s]

Datapoints:   0%|          | 0/12 [00:00<?, ?it/s]
Datapoints:  17%|█▋        | 2/12 [00:00<00:00, 15.33it/s]
Datapoints:  33%|███▎      | 4/12 [00:00<00:00, 15.29it/s]
Datapoints:  50%|█████     | 6/12 [00:00<00:00, 15.37it/s]
Datapoints:  67%|██████▋   | 8/12 [00:00<00:00, 15.46it/s]
Datapoints:  83%|████████▎ | 10/12 [00:00<00:00, 15.36it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.06it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.21it/s]

Datapoints:   0%|          | 0/12 [00:00<?, ?it/s]
Datapoints:  17%|█▋        | 2/12 [00:00<00:00, 15.79it/s]
Datapoints:  33%|███▎      | 4/12 [00:00<00:00, 15.55it/s]
Datapoints:  50%|█████     | 6/12 [00:00<00:00, 15.38it/s]
Datapoints:  67%|██████▋   | 8/12 [00:00<00:00, 15.45it/s]
Datapoints:  83%|████████▎ | 10/12 [00:00<00:00, 15.29it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.22it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.32it/s]

Datapoints:   0%|          | 0/12 [00:00<?, ?it/s]
Datapoints:  17%|█▋        | 2/12 [00:00<00:00, 15.76it/s]
Datapoints:  33%|███▎      | 4/12 [00:00<00:00, 15.51it/s]
Datapoints:  50%|█████     | 6/12 [00:00<00:00, 15.53it/s]
Datapoints:  67%|██████▋   | 8/12 [00:00<00:00, 15.58it/s]
Datapoints:  83%|████████▎ | 10/12 [00:00<00:00, 15.39it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.44it/s]
Datapoints: 100%|██████████| 12/12 [00:00<00:00, 15.48it/s]

Inspecting the results#

The results are very similar to the output of GridSearch. Besides the main results, we provide the results for each single datapoint and the respective grouplabel for the datapoints.

results = pd.DataFrame(opti.search_results_)
results

	datetime_start	datetime_complete	duration	param_algorithm__high_pass_filter_cutoff_hz	param_algorithm__min_r_peak_height_over_baseline	state	data_labels	precision	recall	f1_score	single_precision	single_recall	single_f1_score	params
0	2024-04-17 14:47:08.663466	2024-04-17 14:47:09.424303	0 days 00:00:00.760837	2.0	0.8	COMPLETE	[(group_1, 100), (group_2, 102), (group_3, 104...	0.975941	0.742356	0.778327	[1.0, 0.9739256397875422, 0.967756381549485, 0...	[0.9995600527936648, 0.922267946959305, 0.9694...	[0.9997799779977998, 0.9473931423203381, 0.968...	{'algorithm__min_r_peak_height_over_baseline':...
1	2024-04-17 14:47:09.425118	2024-04-17 14:47:10.277504	0 days 00:00:00.852386	1.2	1.5	COMPLETE	[(group_1, 100), (group_2, 102), (group_3, 104...	0.827515	0.308203	0.341468	[1.0, 1.0, 1.0, 0.9347826086956522, 0.99823943...	[0.015398152221733392, 0.0004572473708276177, ...	[0.030329289428076254, 0.0009140767824497258, ...	{'algorithm__min_r_peak_height_over_baseline':...
2	2024-04-17 14:47:10.278226	2024-04-17 14:47:11.124368	0 days 00:00:00.846142	0.4	0.4	COMPLETE	[(group_1, 100), (group_2, 102), (group_3, 104...	0.874715	0.869669	0.858757	[0.9995600527936648, 0.9711934156378601, 0.935...	[0.9995600527936648, 0.9711934156378601, 0.967...	[0.9995600527936648, 0.9711934156378601, 0.951...	{'algorithm__min_r_peak_height_over_baseline':...
3	2024-04-17 14:47:11.125104	2024-04-17 14:47:11.997710	0 days 00:00:00.872606	1.8	0.2	COMPLETE	[(group_1, 100), (group_2, 102), (group_3, 104...	0.787612	0.902061	0.829179	[0.9991204925241864, 0.9461024498886415, 0.907...	[0.9995600527936648, 0.9711934156378601, 0.969...	[0.9993402243237299, 0.9584837545126353, 0.937...	{'algorithm__min_r_peak_height_over_baseline':...
4	2024-04-17 14:47:11.998475	2024-04-17 14:47:12.835238	0 days 00:00:00.836763	1.5	1.3	COMPLETE	[(group_1, 100), (group_2, 102), (group_3, 104...	0.913540	0.428422	0.472669	[1.0, 1.0, 0.9914163090128756, 0.9884959522795...	[0.3783545974483062, 0.002286236854138089, 0.2...	[0.5489945738908394, 0.0045620437956204385, 0....	{'algorithm__min_r_peak_height_over_baseline':...
5	2024-04-17 14:47:12.835988	2024-04-17 14:47:13.699600	0 days 00:00:00.863612	2.0	0.1	COMPLETE	[(group_1, 100), (group_2, 102), (group_3, 104...	0.628790	0.916112	0.735941	[0.9109863672814755, 0.49212233549582945, 0.54...	[0.9995600527936648, 0.9711934156378601, 0.969...	[0.9532200545416404, 0.6532369675534369, 0.699...	{'algorithm__min_r_peak_height_over_baseline':...
6	2024-04-17 14:47:13.700307	2024-04-17 14:47:14.530838	0 days 00:00:00.830531	0.5	1.7	COMPLETE	[(group_1, 100), (group_2, 102), (group_3, 104...	0.734090	0.287615	0.316558	[1.0, 0, 1.0, 0.8153846153846154, 0.9962157048...	[0.0004399472063352398, 0, 0.05787348586810229...	[0.0008795074758135445, 0, 0.10941475826972011...	{'algorithm__min_r_peak_height_over_baseline':...
7	2024-04-17 14:47:14.531602	2024-04-17 14:47:15.381996	0 days 00:00:00.850394	0.4	0.4	COMPLETE	[(group_1, 100), (group_2, 102), (group_3, 104...	0.874715	0.869669	0.858757	[0.9995600527936648, 0.9711934156378601, 0.935...	[0.9995600527936648, 0.9711934156378601, 0.967...	[0.9995600527936648, 0.9711934156378601, 0.951...	{'algorithm__min_r_peak_height_over_baseline':...
8	2024-04-17 14:47:15.382736	2024-04-17 14:47:16.228501	0 days 00:00:00.845765	1.1	0.7	COMPLETE	[(group_1, 100), (group_2, 102), (group_3, 104...	0.965966	0.818186	0.853251	[0.9995600527936648, 0.9717514124293786, 0.965...	[0.9995600527936648, 0.943758573388203, 0.9681...	[0.9995600527936648, 0.9575504523312457, 0.966...	{'algorithm__min_r_peak_height_over_baseline':...
9	2024-04-17 14:47:16.229293	2024-04-17 14:47:17.065983	0 days 00:00:00.836690	0.6	0.9	COMPLETE	[(group_1, 100), (group_2, 102), (group_3, 104...	0.984398	0.744233	0.786534	[0.9995600527936648, 0.9739130434782609, 0.966...	[0.9995600527936648, 0.9218106995884774, 0.956...	[0.9995600527936648, 0.9471458773784356, 0.961...	{'algorithm__min_r_peak_height_over_baseline':...

We can also get the best para combi and an instance of the pipeline initialized with the best parameter combination.

print("Best Para Combi:", opti.best_params_)
print("Best score:", opti.best_score_)
print("Paras of optimized Pipeline:", opti.optimized_pipeline_.get_params())

Best Para Combi: {'algorithm__min_r_peak_height_over_baseline': 0.4, 'algorithm__high_pass_filter_cutoff_hz': 0.4}
Best score: 0.858757056619628
Paras of optimized Pipeline: {'algorithm__high_pass_filter_cutoff_hz': 0.4, 'algorithm__max_heart_rate_bpm': 200.0, 'algorithm__min_r_peak_height_over_baseline': 0.4, 'algorithm': QRSDetector(high_pass_filter_cutoff_hz=0.4, max_heart_rate_bpm=200.0, min_r_peak_height_over_baseline=0.4)}

Total running time of the script: (0 minutes 9.402 seconds)

Estimated memory usage: 29 MB

Gallery generated by Sphinx-Gallery