Page cover image

Logistic Regression

Logistic Regression is a statistical method used in machine learning for binary classification (can be extended to handle more than two classes) tasks, where the goal is to predict one of two possible outcomes, typically represented as 0 or 1. It is commonly used for problems like spam detection, medical diagnosis, credit risk analysis and customer churn analysis.

It is also known as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier, which uses logit function, a.k.a. log-odds, (inverse of the logistic function) to model the probabilities describing the possible outcomes of a single trial, i.e. a binary outcome (such as 0 or 1) based on one or more predictor variables. More specifically, it models the logarithm of the odds ratio.

The logistic (sigmoid) function is defined as: σ(X)=11+eX\sigma(X)= \frac1 {1+e^{-X}}

σ(X)=11+e(b+𝛽1x1+𝛽2x2+...+𝛽nxn)\sigma(X) = \frac1{1+ e^{-(b+𝛽_1x_1+𝛽_2x_2+...+𝛽_n x_n)}}

where

  • input X=b+𝛽1x1+𝛽2x2+...+𝛽nxnX=b+𝛽_1x_1+𝛽_2x_2+...+𝛽_nx_n

    • 𝛽1,𝛽2,...,𝛽n𝛽_1,𝛽_2,...,𝛽_n are the coefficients (weights) of the model, and b is the bias

    • x1,x2,...,xnx_1,x_2,...,x_n values are the feature values

The logistic function compresses the input X into the range of [0,1], which can then be interpreted as a probability.

The logit function is the inverse of the logistic function and equals to the input value:

logit(p)=σ1(p)=lnp1p=b+𝛽1x1+𝛽2x2+...+𝛽nxnlogit(p)= \sigma^{-1}(p)=ln\frac p{1-p}=b+𝛽_1x_1+𝛽_2x_2+...+𝛽_n x_n
  • where p is a probability (0<p<1).

Note that Logistic Regression assumes linear relationship between log-odds target and predictors.

Understanding Odds and Odds Ratio

The odds of an event (e.g., churn) is defined as the ratio of the probability of the event occurring (p) to the probability of the event not occurring (1 - p). Mathematically, the odds (O) are expressed as: 𝑂=p1p𝑂=\frac p{1-p} where 𝑝𝑝 is the probability of the event (e.g., probability of churn).

The odds ratio quantifies the change in odds of the outcome (e.g., churn) associated with a one-unit change in a predictor variable. For a logistic regression model with coefficients b,β1,β2,,βnb,β_1,β_2,…,β_n for predictor variables 𝑥1,𝑥2,,𝑥𝑛𝑥_1,𝑥_2,…,𝑥_𝑛, the odds ratio (OR) for a predictor 𝑥𝑖𝑥𝑖 is calculated as: OR𝑥𝑖=eβiOR_{𝑥𝑖} = e^{β_i}

where βiβ_i is the coefficient (parameter) of the predictor xix_i​.

Interpreting Odds Ratios in Logistic Regression

An odds ratio of ORxi​​=eβiOR_{xi}​​=e^{βi​} indicates how the odds of the outcome change with a one-unit increase in the predictor xix_i​.

  • If ORxi​​>1OR_{xi}​​>1: A one-unit increase in xix_i​ is associated with higher odds (increased likelihood) of the outcome.

  • If ORxi​​<1OR_{xi}​​<1: A one-unit increase in xix_i​ is associated with lower odds (decreased likelihood) of the outcome.

Suppose β1=0.5β_1​=0.5 for a predictor variable x1x_1. The odds ratio OR𝑥1=e0.51.648OR_{𝑥1}=e^{0.5}≈1.648 means that for each one-unit increase in x1x_1, the odds of the outcome (e.g., churn) increase by approximately 64.8%.

Now, suppose the coefficient β2=0.2β_2​=−0.2 for the feature x2x_2 (i.e. Years as customer). The odds ratio OR𝑥2=e0.20.8187OR_{𝑥2}=e^{-0.2}≈0.8187 means that for each one-unit increase in x2x_2, the odds of churn decrease by approximately 18.13%.

Logistic Regression Classifier: Math Example

From TensorFlow:

A logistic regression model uses the following two-step architecture:

  1. The model generates a raw prediction (y') by applying a linear function of input features.

  2. The model uses that raw prediction as input to a sigmoid function, which converts the raw prediction to a value between 0 and 1, exclusive.

Like any regression model, a logistic regression model predicts a number. However, this number typically becomes part of a binary classification model as follows:

  • If the predicted number is greater than the classification threshold, the binary classification model predicts the positive class.

  • If the predicted number is less than the classification threshold, the binary classification model predicts the negative class.

Assume that we are building a Logistic Regression classifier for a brand to be used in customer churn prediction. Our hypothetical model will be using three features:

import math

# Suppose we had a logistic regression model
# 1) with three features with the following values:
x1= 1
x2= 3
x3= 2


# 2) which learned the following bias and weights:
b = 2
w1 = 1
w2 = -2
w3 = 2

# solve for linear equation
X = b + w1*x1 + w2*x2 + w3*x3 

# probability using logtistic (sigmoid) function
prob_logistic = 1 / (1 + math.e**-X)
print(f"Logistic function probability: {prob_logistic}")

# probability using logit function
prob_logit = math.e**X/(1+math.e**X)
print(f"Logit function probability: {prob_logit}")

"""
Logistic function probability: 0.7310585786300049
Logit function probability: 0.7310585786300049
"""

The probability of the logistic regression model for this particular instance is ~0.73 (the model is estimating a 73% chance of customer to churn). If the classification threshold value for probability p (default value = 0.5) is lower than this value, then this instance would be predicted as churn (i.e. class 1 or positive class)! If the predefined threshold is higher than the model's outcome, the instance will be then be predicted as no-churn (i.e. class 0 or negative class). This shows us the importance of the class probability threshold and how it can affects the number of false negatives and false positives in classification cases.

What are the considerations?

When using logistic regression for predictive modeling, there are several important considerations and factors to take into account to ensure the model's effectiveness, reliability, and interpretability. Here are key considerations for logistic regression:

  1. Binary Outcome: Logistic regression is suitable for binary classification tasks where the target variable (outcome) is binary (e.g., yes/no, 0/1, churn/no churn). Ensure that the problem is well-defined as a binary classification task before applying logistic regression.

  2. Linear Relationship: Logistic regression assumes a linear relationship between the log-odds of the outcome and the predictor variables. It's important to assess whether this assumption holds true for your dataset. Consider transformations or interactions if nonlinear relationships are suspected.

  3. Feature Selection: Carefully select relevant features (predictor variables) that are likely to influence the outcome. Feature selection can help improve model performance, reduce overfitting, and enhance interpretability. Use domain knowledge and exploratory data analysis to identify meaningful predictors.

  4. Handling Missing Data: Address missing values in the dataset appropriately before applying logistic regression. Options include imputation (replacing missing values with estimated values), deletion of records with missing data, or treating missing values as a separate category.

  5. Categorical Variables: Encode categorical variables using appropriate techniques such as one-hot encoding or label encoding. Ensure that categorical variables are transformed into a format that logistic regression can interpret correctly.

  6. Outliers: Identify and handle outliers in the dataset, as logistic regression can be sensitive to extreme values. Consider using robust methods for outlier detection and treatment.

  7. Collinearity: Check for multicollinearity (high correlation) among predictor variables, as it can affect the stability and interpretability of logistic regression coefficients. Use techniques such as variance inflation factor (VIF) analysis to assess collinearity.

  8. Model Interpretability: Logistic regression provides interpretable results in terms of coefficient estimates and odds ratios. Take advantage of these interpretable features to understand how each predictor influences the likelihood of the outcome.

  9. Regularization: Consider applying regularization techniques (e.g., L1 or L2 regularization) to control overfitting and improve model generalization. Regularization penalizes large coefficients and can lead to more robust models.

  10. Evaluation Metrics: Choose appropriate evaluation metrics for assessing model performance, such as accuracy, precision, recall, F1-score, and area under the ROC curve (AUC-ROC). Select metrics that align with the specific goals and requirements of the application.

  11. Cross-Validation: Use techniques like cross-validation to assess the model's stability and generalizability. Cross-validation helps estimate the model's performance on unseen data and can guide hyperparameter tuning.

  12. Class Imbalance: Address class imbalance if present in the dataset (i.e., unequal distribution of classes). Consider techniques such as stratified sampling, resampling methods (e.g., oversampling minority class, undersampling majority class), or using class weights during model training.

By carefully considering these factors and addressing potential challenges, logistic regression can be effectively applied to solve binary classification problems in various domains. Understanding the nuances of logistic regression helps ensure the reliability and robustness of the predictive model.

Conclusion

Each classification algorithm has its strengths and weaknesses, making it important to choose the most suitable algorithm based on the specific characteristics of the dataset and the requirements of the task at hand. Experimentation, model evaluation, and tuning are essential for achieving optimal performance.

Last updated

Was this helpful?