{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# Grid Search optimal Algorithm Parameter\n\nIn case no better way exists to optimize a parameter of a algorithm or pipeline an exhaustive Gridsearch might be a\ngood idea.\n`tpcp` provides a Gridsearch that is algorithm agnostic (as long as you can wrap your algorithm into a pipeline).\n\nAs example, we are going to Gridsearch some parameters of the `QRSDetector` we implemented in\n`custom_algorithms_qrs_detection`.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To perform a GridSearch (or any other form of parameter optimization in Gaitmap), we first need to have a\n**Dataset**, a **Pipeline** and a **score** function.\n\n## 1. The Dataset\nDatsets wrap multiple recordings into an easy-to-use interface that can be passed around between the higher\nlevel `tpcp` functions.\nLearn more about this `here <custom_datasets>`.\nIf you are lucky, you do not need to create the dataset on your own, but someone has already created a dataset\nfor the data you want to use.\n\nHere, we're just going to reuse the ECGExample dataset we created in `custom_dataset_ecg`.\n\nFor our GridSearch, we need an instance of this dataset.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from pathlib import Path\n\nimport numpy as np\n\nfrom examples.datasets.datasets_final_ecg import ECGExampleData\n\ntry:\n    HERE = Path(__file__).parent\nexcept NameError:\n    HERE = Path(\".\").resolve()\ndata_path = HERE.parent.parent / \"example_data/ecg_mit_bih_arrhythmia/data\"\nexample_data = ECGExampleData(data_path)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1. The Pipeline\nThe pipeline simply defines what algorithms we want to run on our data and defines, which parameters of the pipeline\nyou still want to be able to modify (e.g. to optimize in the GridSearch).\n\nThe pipeline usually needs 3 things:\n\n1. It needs to be subclass of :class:`~tpcp.Pipeline`.\n2. It needs to have a `run` method that runs all the algorithmic steps and stores the results as class attributes.\n   The `run` method should expect only a single data point (in our case a single recording of one sensor) as input.\n3. A `init` that defines all parameters that should be adjustable. Note, that the names in the function signature of\n   the `init` method, **must** match the corresponding attribute names (e.g. `max_cost` -> `self.max_cost`).\n   If you want to adjust multiple parameters that all belong to the same algorithm, it might also be convenient to\n   just pass the algorithm as a parameter. However, keep potential issues with mutable defaults in mind (`more\n   info <mutable_defaults>`).\n\nHere we simply extract the data and sampling rate from the datapoint and then run the algorithm.\nWe store the final results we are interested in on the pipeline object.\n\nFor the final GridSearch, we need an instance of the pipeline object.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import pandas as pd\n\nfrom examples.algorithms.algorithms_qrs_detection_final import QRSDetector\nfrom tpcp import Parameter, Pipeline, cf\n\n\nclass MyPipeline(Pipeline[ECGExampleData]):\n    algorithm: Parameter[QRSDetector]\n\n    r_peak_positions_: pd.Series\n\n    def __init__(self, algorithm: QRSDetector = cf(QRSDetector())):\n        self.algorithm = algorithm\n\n    def run(self, datapoint: ECGExampleData):\n        # Note: We need to clone the algorithm instance, to make sure we don't leak any data between runs.\n        algo = self.algorithm.clone()\n        algo.detect(datapoint.data[\"ecg\"], datapoint.sampling_rate_hz)\n\n        self.r_peak_positions_ = algo.r_peak_positions_\n        return self\n\n\npipe = MyPipeline()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3. The scorer\nIn the context of a gridsearch, we want to calculate the performance of our algorithm and rank the different\nparameter candidates accordingly.\nThis is what our score function is for.\nIt gets a pipeline object (**without** results!) and a data point (i.e. a single recording) as input and should\nreturn a some sort of performance metric.\nA higher value is always considered better.\nIf you want to calculate multiple performance measures, you can also return a dictionary of such values.\nIn any case, the performance for a specific parameter combination in the GridSearch will be calculated as the mean\nover all datapoints.\n(Note, if you want to change this, you can create custom subclasses of :class:`~tpcp.validate.Scorer`).\n\nA typical score function will first call `safe_run` (which calls `run` internally) on the pipeline and then\ncompare the output with some reference.\nThis reference should be supplied as part of the dataset.\n\nInstead of using a function as scorer (shown here), you can also implement a method called `score` on your pipeline.\nThen just pass `None` (which is the default) for the `scoring` parameter in the GridSearch (and other optimizers).\nHowever, a function is usually more flexible.\n\nIn this case we compare the identified R-peaks with the reference and identify which R-peaks were correctly\nfound within a certain margin around the reference points\nBased on these matches, we calculate the precision, the recall, and the f1-score using some helper functions.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from examples.algorithms.algorithms_qrs_detection_final import match_events_with_reference, precision_recall_f1_score\n\n\ndef score(pipeline: MyPipeline, datapoint: ECGExampleData):\n    # We use the `safe_run` wrapper instead of just run. This is always a good idea.\n    # We don't need to clone the pipeline here, as GridSearch will already clone the pipeline internally and `run`\n    # will clone it again.\n    pipeline = pipeline.safe_run(datapoint)\n    tolerance_s = 0.02  # We just use 20 ms for this example\n    matches = match_events_with_reference(\n        pipeline.r_peak_positions_.to_numpy(),\n        datapoint.r_peak_positions_.to_numpy(),\n        tolerance=tolerance_s * datapoint.sampling_rate_hz,\n    )\n    precision, recall, f1_score = precision_recall_f1_score(matches)\n    return {\"precision\": precision, \"recall\": recall, \"f1_score\": f1_score}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## The Parameters\nThe last step before running the GridSearch, is to select the parameters we want to test for each dataset.\nFor this, we can directly use sklearn's `ParameterGrid`.\n\nIn this example, we will just test three values for the `high_pass_filter_cutoff_hz`.\nAs this is a nested paramater, we use the `__` syntax to set it.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from sklearn.model_selection import ParameterGrid\n\nparameters = ParameterGrid({\"algorithm__high_pass_filter_cutoff_hz\": [0.25, 0.5, 1]})"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Running the GridSearch\nNow we have all the pieces to run the GridSearch.\nAfter initializing, we can use `optimize` to run the GridSearch.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>If the score function returns a dictionary of scores, `rank_scorer` must be set to the name of the score,\n          that should be used to decide on the best parameter set.</p></div>\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from tpcp.optimize import GridSearch\n\ngs = GridSearch(pipe, parameters, scoring=score, return_optimized=\"f1_score\")\ngs = gs.optimize(example_data)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The main results are stored in `gs_results_`.\nIt shows the mean performance per parameter combination, the rank for each parameter combination and the\nperformance for each individual data point (in our case a single recording of one participant).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "results = gs.gs_results_\nresults"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "pd.DataFrame(results)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Further, the `optimized_pipeline_` parameter holds an instance of the pipeline initialized with the best parameter\ncombination.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"Best Para Combi:\", gs.best_params_)\nprint(\"Paras of optimized Pipeline:\", gs.optimized_pipeline_.get_params())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To run the optimized pipeline, we can directly use the `run`/`safe_run` method on the GridSearch object.\nThis makes it possible to use the `GridSearch` as a replacement for your pipeline object with minimal code changes.\n\nIf you tried to call `run`/`safe_run` (or `score` for that matter), before the optimization, an error is raised.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "r_peaks = gs.safe_run(example_data[0]).r_peak_positions_\nr_peaks"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.13"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}