Data Science Hub
  • Data Science Hub
  • STATISTICS
    • Introduction
    • Fundamentals
      • Data Types
      • Central Tendency, Asymmetry, and Variability
      • Sampling
      • Confidence Interval
      • Hypothesis Testing
    • Distributions
      • Exponential Distribution
    • A/B Testing
      • Sample Size Calculation
      • Multiple Testing
  • Database
    • Database Fundamentals
    • Database Management Systems
    • Data Warehouse vs Data Lake
  • SQL
    • SQL Basics
      • Creating and Modifying Tables/Views
      • Data Types
      • Joins
    • SQL Rules
    • SQL Aggregate Functions
    • SQL Window Functions
    • SQL Data Manipulation
      • String Operations
      • Date/Time Operations
    • SQL Descriptive Stats
    • SQL Tips
    • SQL Performance Tuning
    • SQL Customization
    • SQL Practice
      • Designing Databases
        • Spotify Database Design
      • Most Commonly Asked
      • Mixed Queries
      • Popular Websites For SQL Practice
        • SQLZoo
          • World - BBC Tables
            • SUM and COUNT Tutorial
            • SELECT within SELECT Tutorial
            • SELECT from WORLD Tutorial
            • Select Quiz
            • BBC QUIZ
            • Nested SELECT Quiz
            • SUM and COUNT Quiz
          • Nobel Table
            • SELECT from Nobel Tutorial
            • Nobel Quiz
          • Soccer / Football Tables
            • JOIN Tutorial
            • JOIN Quiz
          • Movie / Actor / Casting Tables
            • More JOIN Operations Tutorial
            • JOIN Quiz 2
          • Teacher - Dept Tables
            • Using Null Quiz
          • Edinburgh Buses Table
            • Self join Quiz
        • HackerRank
          • SQL (Basic)
            • Select All
            • Select By ID
            • Japanese Cities' Attributes
            • Revising the Select Query I
            • Revising the Select Query II
            • Revising Aggregations - The Count Function
            • Revising Aggregations - The Sum Function
            • Revising Aggregations - Averages
            • Average Population
            • Japan Population
            • Population Density Difference
            • Population Census
            • African Cities
            • Average Population of Each Continent
            • Weather Observation Station 1
            • Weather Observation Station 2
            • Weather Observation Station 3
            • Weather Observation Station 4
            • Weather Observation Station 6
            • Weather Observation Station 7
            • Weather Observation Station 8
            • Weather Observation Station 9
            • Weather Observation Station 10
            • Weather Observation Station 11
            • Weather Observation Station 12
            • Weather Observation Station 13
            • Weather Observation Station 14
            • Weather Observation Station 15
            • Weather Observation Station 16
            • Weather Observation Station 17
            • Weather Observation Station 18
            • Weather Observation Station 19
            • Higher Than 75 Marks
            • Employee Names
            • Employee Salaries
            • The Blunder
            • Top Earners
            • Type of Triangle
            • The PADS
          • SQL (Intermediate)
            • Weather Observation Station 5
            • Weather Observation Station 20
            • New Companies
            • The Report
            • Top Competitors
            • Ollivander's Inventory
            • Challenges
            • Contest Leaderboard
            • SQL Project Planning
            • Placements
            • Symmetric Pairs
            • Binary Tree Nodes
            • Interviews
            • Occupations
          • SQL (Advanced)
            • Draw The Triangle 1
            • Draw The Triangle 2
            • Print Prime Numbers
            • 15 Days of Learning SQL
          • TABLES
            • City - Country
            • Station
            • Hackers - Submissions
            • Students
            • Employee - Employees
            • Occupations
            • Triangles
        • StrataScratch
          • Netflix
            • Oscar Nominees Table
            • Nominee Filmography Table
            • Nominee Information Table
          • Audible
            • Easy - Audible
          • Spotify
            • Worldwide Daily Song Ranking Table
            • Billboard Top 100 Year End Table
            • Daily Rankings 2017 US
          • Google
            • Easy - Google
            • Medium - Google
            • Hard - Google
        • LeetCode
          • Easy
  • Python
    • Basics
      • Variables and DataTypes
        • Lists
        • Dictionaries
      • Control Flow
      • Functions
    • Object Oriented Programming
      • Restaurant Modeler
    • Pythonic Resources
    • Projects
  • Machine Learning
    • Fundamentals
      • Supervised Learning
        • Classification Algorithms
          • k-Nearest Neighbors
            • kNN Parameters & Attributes
          • Logistic Regression
        • Classification Report
      • UnSupervised Learning
        • Clustering
          • Evaluation
      • Preprocessing
        • Scalers: Standard vs MinMax
        • Feature Selection vs Dimensionality Reduction
        • Encoding
    • Frameworks
    • Machine Learning in Advertising
    • Natural Language Processing
      • Stopwords
      • Name Entity Recognition (NER)
      • Sentiment Analysis
        • Agoda Reviews - Part I - Scraping Reviews, Detecting Languages, and Preprocessing
        • Agoda Reviews - Part II - Sentiment Analysis and WordClouds
    • Recommendation Systems
      • Spotify Recommender System - Artists
  • Geospatial Analysis
    • Geospatial Analysis Basics
    • GSA at Work
      • Web Scraping and Mapping
  • GIT
    • GIT Essentials
    • Connecting to GitHub
  • FAQ
    • Statistics
  • Cloud Computing
    • Introduction to Cloud Computing
    • Google Cloud Platform
  • Docker
    • What is Docker?
Powered by GitBook
On this page
  • Understanding Odds and Odds Ratio
  • Interpreting Odds Ratios in Logistic Regression
  • Logistic Regression Classifier: Math Example
  • What are the considerations?
  • Conclusion

Was this helpful?

  1. Machine Learning
  2. Fundamentals
  3. Supervised Learning
  4. Classification Algorithms

Logistic Regression

Last updated 1 year ago

Was this helpful?

Logistic Regression is a statistical method used in machine learning for binary classification (can be extended to handle more than two classes) tasks, where the goal is to predict one of two possible outcomes, typically represented as 0 or 1. It is commonly used for problems like spam detection, medical diagnosis, credit risk analysis and customer churn analysis.

It is also known as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier, which uses , a.k.a. log-odds, (inverse of the ) to model the probabilities describing the possible outcomes of a single trial, i.e. a binary outcome (such as 0 or 1) based on one or more predictor variables. More specifically, it models the logarithm of the odds ratio.

The logistic (sigmoid) function is defined as: σ(X)=11+eāˆ’X\sigma(X)= \frac1 {1+e^{-X}}σ(X)=1+eāˆ’X1​

σ(X)=11+eāˆ’(b+š›½1x1+š›½2x2+...+š›½nxn)\sigma(X) = \frac1{1+ e^{-(b+š›½_1x_1+š›½_2x_2+...+š›½_n x_n)}}σ(X)=1+eāˆ’(b+š›½1​x1​+š›½2​x2​+...+š›½n​xn​)1​

where

  • input X=b+š›½1x1+š›½2x2+...+š›½nxnX=b+š›½_1x_1+š›½_2x_2+...+š›½_nx_nX=b+š›½1​x1​+š›½2​x2​+...+š›½n​xn​

    • š›½1,š›½2,...,š›½nš›½_1,š›½_2,...,š›½_nš›½1​,š›½2​,...,š›½n​ are the coefficients (weights) of the model, and b is the bias

    • x1,x2,...,xnx_1,x_2,...,x_nx1​,x2​,...,xn​ values are the feature values

The logistic function compresses the input X into the range of [0,1], which can then be interpreted as a probability.

The logit function is the inverse of the logistic function and equals to the input value:

logit(p)=Ļƒāˆ’1(p)=lnp1āˆ’p=b+š›½1x1+š›½2x2+...+š›½nxnlogit(p)= \sigma^{-1}(p)=ln\frac p{1-p}=b+š›½_1x_1+š›½_2x_2+...+š›½_n x_nlogit(p)=Ļƒāˆ’1(p)=ln1āˆ’pp​=b+š›½1​x1​+š›½2​x2​+...+š›½n​xn​
  • where p is a probability (0<p<1).

Note that Logistic Regression assumes linear relationship between log-odds target and predictors.

Understanding Odds and Odds Ratio

Interpreting Odds Ratios in Logistic Regression

Logistic Regression Classifier: Math Example

From TensorFlow:

A logistic regression model uses the following two-step architecture:

  1. The model generates a raw prediction (y') by applying a linear function of input features.

Like any regression model, a logistic regression model predicts a number. However, this number typically becomes part of a binary classification model as follows:

  • If the predicted number is less than the classification threshold, the binary classification model predicts the negative class.

Assume that we are building a Logistic Regression classifier for a brand to be used in customer churn prediction. Our hypothetical model will be using three features:

import math

# Suppose we had a logistic regression model
# 1) with three features with the following values:
x1= 1
x2= 3
x3= 2


# 2) which learned the following bias and weights:
b = 2
w1 = 1
w2 = -2
w3 = 2

# solve for linear equation
X = b + w1*x1 + w2*x2 + w3*x3 

# probability using logtistic (sigmoid) function
prob_logistic = 1 / (1 + math.e**-X)
print(f"Logistic function probability: {prob_logistic}")

# probability using logit function
prob_logit = math.e**X/(1+math.e**X)
print(f"Logit function probability: {prob_logit}")

"""
Logistic function probability: 0.7310585786300049
Logit function probability: 0.7310585786300049
"""

What are the considerations?

When using logistic regression for predictive modeling, there are several important considerations and factors to take into account to ensure the model's effectiveness, reliability, and interpretability. Here are key considerations for logistic regression:

  1. Binary Outcome: Logistic regression is suitable for binary classification tasks where the target variable (outcome) is binary (e.g., yes/no, 0/1, churn/no churn). Ensure that the problem is well-defined as a binary classification task before applying logistic regression.

  2. Linear Relationship: Logistic regression assumes a linear relationship between the log-odds of the outcome and the predictor variables. It's important to assess whether this assumption holds true for your dataset. Consider transformations or interactions if nonlinear relationships are suspected.

  3. Feature Selection: Carefully select relevant features (predictor variables) that are likely to influence the outcome. Feature selection can help improve model performance, reduce overfitting, and enhance interpretability. Use domain knowledge and exploratory data analysis to identify meaningful predictors.

  4. Handling Missing Data: Address missing values in the dataset appropriately before applying logistic regression. Options include imputation (replacing missing values with estimated values), deletion of records with missing data, or treating missing values as a separate category.

  5. Categorical Variables: Encode categorical variables using appropriate techniques such as one-hot encoding or label encoding. Ensure that categorical variables are transformed into a format that logistic regression can interpret correctly.

  6. Outliers: Identify and handle outliers in the dataset, as logistic regression can be sensitive to extreme values. Consider using robust methods for outlier detection and treatment.

  7. Collinearity: Check for multicollinearity (high correlation) among predictor variables, as it can affect the stability and interpretability of logistic regression coefficients. Use techniques such as variance inflation factor (VIF) analysis to assess collinearity.

  8. Model Interpretability: Logistic regression provides interpretable results in terms of coefficient estimates and odds ratios. Take advantage of these interpretable features to understand how each predictor influences the likelihood of the outcome.

  9. Regularization: Consider applying regularization techniques (e.g., L1 or L2 regularization) to control overfitting and improve model generalization. Regularization penalizes large coefficients and can lead to more robust models.

  10. Evaluation Metrics: Choose appropriate evaluation metrics for assessing model performance, such as accuracy, precision, recall, F1-score, and area under the ROC curve (AUC-ROC). Select metrics that align with the specific goals and requirements of the application.

  11. Cross-Validation: Use techniques like cross-validation to assess the model's stability and generalizability. Cross-validation helps estimate the model's performance on unseen data and can guide hyperparameter tuning.

  12. Class Imbalance: Address class imbalance if present in the dataset (i.e., unequal distribution of classes). Consider techniques such as stratified sampling, resampling methods (e.g., oversampling minority class, undersampling majority class), or using class weights during model training.

By carefully considering these factors and addressing potential challenges, logistic regression can be effectively applied to solve binary classification problems in various domains. Understanding the nuances of logistic regression helps ensure the reliability and robustness of the predictive model.

Conclusion

Each classification algorithm has its strengths and weaknesses, making it important to choose the most suitable algorithm based on the specific characteristics of the dataset and the requirements of the task at hand. Experimentation, model evaluation, and tuning are essential for achieving optimal performance.

The odds of an event (e.g., churn) is defined as the ratio of the probability of the event occurring (p) to the probability of the event not occurring (1 - p). Mathematically, the odds (O) are expressed as: š‘‚=p1āˆ’pš‘‚=\frac p{1-p}O=1āˆ’pp​ where š‘š‘p is the probability of the event (e.g., probability of churn).

The odds ratio quantifies the change in odds of the outcome (e.g., churn) associated with a one-unit change in a predictor variable. For a logistic regression model with coefficients b,β1,β2,…,βnb,β_1,β_2,…,β_nb,β1​,β2​,…,βn​ for predictor variables š‘„1,š‘„2,…,š‘„š‘›š‘„_1,š‘„_2,…,š‘„_š‘›x1​,x2​,…,xn​, the odds ratio (OR) for a predictor š‘„š‘–š‘„š‘–xi is calculated as: ORš‘„š‘–=eβiOR_{š‘„š‘–} = e^{β_i}ORxi​=eβi​

where βiβ_iβi​ is the coefficient (parameter) of the predictor xi​x_i​xi​​.

An odds ratio of ORxi​​=eβi​OR_{xi}​​=e^{βi​}ORxi​​​=eβi​ indicates how the odds of the outcome change with a one-unit increase in the predictor xi​x_i​xi​​.

If ORxi​​>1OR_{xi}​​>1ORxi​​​>1: A one-unit increase in xi​x_i​xi​​ is associated with higher odds (increased likelihood) of the outcome.

If ORxi​​<1OR_{xi}​​<1ORxi​​​<1: A one-unit increase in xi​x_i​xi​​ is associated with lower odds (decreased likelihood) of the outcome.

Suppose β1​=0.5β_1​=0.5β1​​=0.5 for a predictor variable x1x_1x1​. The odds ratio ORš‘„1=e0.5ā‰ˆ1.648OR_{š‘„1}=e^{0.5}ā‰ˆ1.648ORx1​=e0.5ā‰ˆ1.648 means that for each one-unit increase in x1x_1x1​, the odds of the outcome (e.g., churn) increase by approximately 64.8%.

Now, suppose the coefficient β2​=āˆ’0.2β_2​=āˆ’0.2β2​​=āˆ’0.2 for the feature x2x_2x2​ (i.e. Years as customer). The odds ratio ORš‘„2=eāˆ’0.2ā‰ˆ0.8187OR_{š‘„2}=e^{-0.2}ā‰ˆ0.8187ORx2​=eāˆ’0.2ā‰ˆ0.8187 means that for each one-unit increase in x2x_2x2​, the odds of churn decrease by approximately 18.13%.

The model uses that raw prediction as input to a , which converts the raw prediction to a value between 0 and 1, exclusive.

If the predicted number is greater than the , the binary classification model predicts the positive class.

The probability of the logistic regression model for this particular instance is ~0.73 (the model is estimating a 73% chance of customer to churn). If the classification threshold value for probability p (default value = 0.5) is lower than this value, then this instance would be predicted as churn (i.e. class 1 or positive class)! If the predefined threshold is higher than the model's outcome, the instance will be then be predicted as no-churn (i.e. class 0 or negative class). This shows us the importance of the class probability threshold and how it can affects the number of in classification cases.

sigmoid function
classification threshold
false negatives and false positives
logit function
logistic function
Page cover image