{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Regularization path of L1- Logistic Regression\n\n\nTrain l1-penalized logistic regression models on a binary classification\nproblem derived from the Iris dataset.\n\nThe models are ordered from strongest regularized to least regularized. The 4\ncoefficients of the models are collected and plotted as a \"regularization\npath\": on the left-hand side of the figure (strong regularizers), all the\ncoefficients are exactly 0. When regularization gets progressively looser,\ncoefficients can get non-zero values one after the other.\n\nHere we choose the liblinear solver because it can efficiently optimize for the\nLogistic Regression loss with a non-smooth, sparsity inducing l1 penalty.\n\nAlso note that we set a low value for the tolerance to make sure that the model\nhas converged before collecting the coefficients.\n\nWe also use warm_start=True which means that the coefficients of the models are\nreused to initialize the next model fit to speed-up the computation of the\nfull-path.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Load data\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from sklearn import datasets\n\niris = datasets.load_iris()\nX = iris.data\ny = iris.target\nfeature_names = iris.feature_names"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Here we remove the third class to make the problem a binary classification\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "X = X[y != 2]\ny = y[y != 2]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Compute regularization path\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\n\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.svm import l1_min_c\n\ncs = l1_min_c(X, y, loss=\"log\") * np.logspace(0, 1, 16)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create a pipeline with `StandardScaler` and `LogisticRegression`, to normalize\nthe data before fitting a linear model, in order to speed-up convergence and\nmake the coefficients comparable. Also, as a side effect, since the data is now\ncentered around 0, we don't need to fit an intercept.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "clf = make_pipeline(\n    StandardScaler(),\n    LogisticRegression(\n        l1_ratio=1,\n        solver=\"liblinear\",\n        tol=1e-6,\n        max_iter=int(1e6),\n        warm_start=True,\n        fit_intercept=False,\n    ),\n)\ncoefs_ = []\nfor c in cs:\n    clf.set_params(logisticregression__C=c)\n    clf.fit(X, y)\n    coefs_.append(clf[\"logisticregression\"].coef_.ravel().copy())\n\ncoefs_ = np.array(coefs_)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Plot regularization path\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n\n# Colorblind-friendly palette (IBM Color Blind Safe palette)\ncolors = [\"#648FFF\", \"#785EF0\", \"#DC267F\", \"#FE6100\"]\n\nplt.figure(figsize=(10, 6))\nfor i in range(coefs_.shape[1]):\n    plt.semilogx(cs, coefs_[:, i], marker=\"o\", color=colors[i], label=feature_names[i])\n\nymin, ymax = plt.ylim()\nplt.xlabel(\"C\")\nplt.ylabel(\"Coefficients\")\nplt.title(\"Logistic Regression Path\")\nplt.legend()\nplt.axis(\"tight\")\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.14"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}