{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# Optimization Info\n\nTpcp is focused on \"running\" pipelines and less on the \"optimization\" step.\nThis is great for traditional algorithms and algorithms will complex return values, as you can easily store multiple\nparameters as attributes on the object during `run`.\n\nHowever, for optimization, you are limited to modifying input parameters.\nThis works well in many cases, but sometimes, you need additional information from the optimization.\nFor example, you might want to extract the loss decay of an iterative learning algorithms.\nThis information is something that you wouldn't want to store in the input parameters (usually).\n\nFor these cases tpcp provides the `self_optimize_with_info` method.\nThis is basically identical to `self_optimize`, but is expected to provide two return values: the optimized instance\nAND an arbitrary additional object containing any information you like.\nMethods that get optimizable pipelines as input (e.g. :class:`~tpcp.optimize.Optimize` are aware of these method and\nwill call `self_optimize_with_info` if available and store the additional info as result objects.\n\nThe :class:`~tpcp.OptimizablePipeline` base-class is implemented in a way that you only need to worry about\nimplementing either the `self_optimize_with_info` or the `self_optimize` method.\nThe other will be available automatically (the additional info will be `NOTHING`, if the method is not implemented).\n\nIf you are implementing a new Algorithm (instead of a pipeline), we don't provide this additional support,\nbut it is relatively simple to implement yourself.\n\nIn the following we will show how all of this works by expanding the QRS detection algorithm implemented in the other\nexamples to return additional information from the optimization.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from typing import Dict, List, Tuple\n\nimport numpy as np\nimport pandas as pd\nfrom sklearn.metrics import roc_curve\nfrom typing_extensions import Self\n\nfrom examples.algorithms.algorithms_qrs_detection_final import (\n    OptimizableQrsDetector,\n    QRSDetector,\n    match_events_with_reference,\n)\nfrom examples.datasets.datasets_final_ecg import ECGExampleData\nfrom tpcp import HyperParameter, OptimizableParameter, OptimizablePipeline, Parameter, cf, make_optimize_safe\nfrom tpcp.optimize import Optimize"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In the algorithm class below, we basically reimplemented the `OptimizableQrsDetector` from the algorithm example.\nHowever, instead of the `self_optimize` method, we implemented the `self_optimize_with_info` method and added\nadditional information from the threshold selection process to the output of the optimization.\n\nTo ensure interface compatibility with other algorithms, we also provided a `self_optimize` method, that simply calls\n`self_optimize_with_info` under the hood.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class OptimizableQrsDetectorWithInfo(QRSDetector):\n    min_r_peak_height_over_baseline: OptimizableParameter[float]\n    r_peak_match_tolerance_s: HyperParameter[float]\n\n    def __init__(\n        self,\n        max_heart_rate_bpm: float = 200.0,\n        min_r_peak_height_over_baseline: float = 1.0,\n        r_peak_match_tolerance_s: float = 0.01,\n        high_pass_filter_cutoff_hz: float = 1,\n    ):\n        self.r_peak_match_tolerance_s = r_peak_match_tolerance_s\n        super().__init__(\n            max_heart_rate_bpm=max_heart_rate_bpm,\n            min_r_peak_height_over_baseline=min_r_peak_height_over_baseline,\n            high_pass_filter_cutoff_hz=high_pass_filter_cutoff_hz,\n        )\n\n    @make_optimize_safe\n    def self_optimize_with_info(\n        self, ecg_data: List[pd.Series], r_peaks: List[pd.Series], sampling_rate_hz: float\n    ) -> Tuple[Self, Dict[str, np.ndarray]]:\n        all_labels = []\n        all_peak_heights = []\n        for d, p in zip(ecg_data, r_peaks):\n            filtered = self._filter(d.to_numpy().flatten(), sampling_rate_hz)\n            # Find all potential peaks without the height threshold\n            potential_peaks = self._search_strategy(filtered, sampling_rate_hz, use_height=False)\n            # Determine the label for each peak, by matching them with our ground truth\n            labels = np.zeros(potential_peaks.shape)\n            matches = match_events_with_reference(\n                events=np.atleast_2d(potential_peaks).T,\n                reference=np.atleast_2d(p.to_numpy().astype(int)).T,\n                tolerance=self.r_peak_match_tolerance_s * sampling_rate_hz,\n            )\n            tp_matches = matches[(~np.isnan(matches)).all(axis=1), 0].astype(int)\n            labels[tp_matches] = 1\n            labels = labels.astype(bool)\n            all_labels.append(labels)\n            all_peak_heights.append(filtered[potential_peaks])\n        all_labels = np.hstack(all_labels)\n        all_peak_heights = np.hstack(all_peak_heights)\n        # We \"brute-force\" a good cutoff by testing a bunch of thresholds and then calculating the Youden Index for\n        # each.\n        fpr, tpr, thresholds = roc_curve(all_labels, all_peak_heights)\n        youden_index = tpr - fpr\n        # The best Youden index gives us a balance between sensitivity and specificity.\n        self.min_r_peak_height_over_baseline = thresholds[np.argmax(youden_index)]\n\n        # Here we create the additional infor object:\n        additional_info = {\"all_youden_index\": youden_index, \"all_thresholds\": thresholds}\n        return self, additional_info\n\n    def self_optimize(self, ecg_data: List[pd.Series], r_peaks: List[pd.Series], sampling_rate_hz: float) -> Self:\n        return self.self_optimize_with_info(ecg_data=ecg_data, r_peaks=r_peaks, sampling_rate_hz=sampling_rate_hz)[0]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To use this algorithm in an optimization, we need a pipeline to wrap it.\nBelow we can find a reimplementation of the pipline from the \"Optimizable Pipeline\" example.\n\nHowever, instead of implementing `self_optimize` method, we implemented the `self_optimize_with_info` method and\nalso called the `self_optimize_with_info` of our algorithm under the hood.\n\nNote, that for pipelines, we don't need to implement a dummy `self_optimize` method.\nOur baseclass already takes care of that.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class MyPipeline(OptimizablePipeline[ECGExampleData]):\n    algorithm: Parameter[OptimizableQrsDetectorWithInfo]\n    algorithm__min_r_peak_height_over_baseline: OptimizableParameter[float]\n\n    r_peak_positions_: pd.Series\n\n    def __init__(self, algorithm: OptimizableQrsDetectorWithInfo = cf(OptimizableQrsDetectorWithInfo())):\n        self.algorithm = algorithm\n\n    @make_optimize_safe\n    def self_optimize_with_info(self, dataset: ECGExampleData, **kwargs):\n        ecg_data = [d.data[\"ecg\"] for d in dataset]\n        r_peaks = [d.r_peak_positions_[\"r_peak_position\"] for d in dataset]\n        # Note: We need to clone the algorithm instance, to make sure we don't leak any data between runs.\n        algo = self.algorithm.clone()\n        # Here we call the `self_optimize_with_info` method!\n        self.algorithm, additional_data = algo.self_optimize_with_info(ecg_data, r_peaks, dataset.sampling_rate_hz)\n        return self, additional_data\n\n    def run(self, datapoint: ECGExampleData):\n        # Note: We need to clone the algorithm instance, to make sure we don't leak any data between runs.\n        algo = self.algorithm.clone()\n        algo.detect(datapoint.data[\"ecg\"], datapoint.sampling_rate_hz)\n\n        self.r_peak_positions_ = algo.r_peak_positions_\n        return self"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's test this class!\n\nHowever, first we need some test data\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from pathlib import Path\n\nfrom sklearn.model_selection import train_test_split\n\ntry:\n    HERE = Path(__file__).parent\nexcept NameError:\n    HERE = Path(\".\").resolve()\ndata_path = HERE.parent.parent / \"example_data/ecg_mit_bih_arrhythmia/data\"\nexample_data = ECGExampleData(data_path)\n\ntrain_set, test_set = train_test_split(example_data, train_size=0.7, random_state=0)\n# We only want a single dataset in the test set\ntest_set = test_set[0]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "With the train data we can try out the optimization\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "optimized_pipe, info = MyPipeline().self_optimize_with_info(train_set)\ninfo"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "optimized_pipe"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "But we can also just call the auto-generated `self_optimize` method and don't get the info:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "optimized_pipe = MyPipeline().self_optimize(train_set)\noptimized_pipe"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "However, in most cases, we should just use the `Optimize` wrapper.\nIt will call the `self_optimize_with_info` method if available (you can force it to use `self_optimize` using the\n`optimize_with_info` parameter) and then provide the additional info as attribute\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "optimizer = Optimize(MyPipeline()).optimize(train_set)\n\noptimizer.optimization_info_"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "optimizer.optimized_pipeline_"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As `Optimize` is aware of this and stores the info as a result attribute, the information is also available in the\noutput of a cross validation.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Further Notes\nSometimes it might be a good idea to provide separate implementation of `self_optimize` and `self_optimize_with_info`.\nThis might be required, when collecting and calculating the additional info creates a relevant computational overhead.\nHowever, you should make sure, that the two methods return the same optimization result otherwise.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.15"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}