Data Science Hub
  • Data Science Hub
  • STATISTICS
    • Introduction
    • Fundamentals
      • Data Types
      • Central Tendency, Asymmetry, and Variability
      • Sampling
      • Confidence Interval
      • Hypothesis Testing
    • Distributions
      • Exponential Distribution
    • A/B Testing
      • Sample Size Calculation
      • Multiple Testing
  • Database
    • Database Fundamentals
    • Database Management Systems
    • Data Warehouse vs Data Lake
  • SQL
    • SQL Basics
      • Creating and Modifying Tables/Views
      • Data Types
      • Joins
    • SQL Rules
    • SQL Aggregate Functions
    • SQL Window Functions
    • SQL Data Manipulation
      • String Operations
      • Date/Time Operations
    • SQL Descriptive Stats
    • SQL Tips
    • SQL Performance Tuning
    • SQL Customization
    • SQL Practice
      • Designing Databases
        • Spotify Database Design
      • Most Commonly Asked
      • Mixed Queries
      • Popular Websites For SQL Practice
        • SQLZoo
          • World - BBC Tables
            • SUM and COUNT Tutorial
            • SELECT within SELECT Tutorial
            • SELECT from WORLD Tutorial
            • Select Quiz
            • BBC QUIZ
            • Nested SELECT Quiz
            • SUM and COUNT Quiz
          • Nobel Table
            • SELECT from Nobel Tutorial
            • Nobel Quiz
          • Soccer / Football Tables
            • JOIN Tutorial
            • JOIN Quiz
          • Movie / Actor / Casting Tables
            • More JOIN Operations Tutorial
            • JOIN Quiz 2
          • Teacher - Dept Tables
            • Using Null Quiz
          • Edinburgh Buses Table
            • Self join Quiz
        • HackerRank
          • SQL (Basic)
            • Select All
            • Select By ID
            • Japanese Cities' Attributes
            • Revising the Select Query I
            • Revising the Select Query II
            • Revising Aggregations - The Count Function
            • Revising Aggregations - The Sum Function
            • Revising Aggregations - Averages
            • Average Population
            • Japan Population
            • Population Density Difference
            • Population Census
            • African Cities
            • Average Population of Each Continent
            • Weather Observation Station 1
            • Weather Observation Station 2
            • Weather Observation Station 3
            • Weather Observation Station 4
            • Weather Observation Station 6
            • Weather Observation Station 7
            • Weather Observation Station 8
            • Weather Observation Station 9
            • Weather Observation Station 10
            • Weather Observation Station 11
            • Weather Observation Station 12
            • Weather Observation Station 13
            • Weather Observation Station 14
            • Weather Observation Station 15
            • Weather Observation Station 16
            • Weather Observation Station 17
            • Weather Observation Station 18
            • Weather Observation Station 19
            • Higher Than 75 Marks
            • Employee Names
            • Employee Salaries
            • The Blunder
            • Top Earners
            • Type of Triangle
            • The PADS
          • SQL (Intermediate)
            • Weather Observation Station 5
            • Weather Observation Station 20
            • New Companies
            • The Report
            • Top Competitors
            • Ollivander's Inventory
            • Challenges
            • Contest Leaderboard
            • SQL Project Planning
            • Placements
            • Symmetric Pairs
            • Binary Tree Nodes
            • Interviews
            • Occupations
          • SQL (Advanced)
            • Draw The Triangle 1
            • Draw The Triangle 2
            • Print Prime Numbers
            • 15 Days of Learning SQL
          • TABLES
            • City - Country
            • Station
            • Hackers - Submissions
            • Students
            • Employee - Employees
            • Occupations
            • Triangles
        • StrataScratch
          • Netflix
            • Oscar Nominees Table
            • Nominee Filmography Table
            • Nominee Information Table
          • Audible
            • Easy - Audible
          • Spotify
            • Worldwide Daily Song Ranking Table
            • Billboard Top 100 Year End Table
            • Daily Rankings 2017 US
          • Google
            • Easy - Google
            • Medium - Google
            • Hard - Google
        • LeetCode
          • Easy
  • Python
    • Basics
      • Variables and DataTypes
        • Lists
        • Dictionaries
      • Control Flow
      • Functions
    • Object Oriented Programming
      • Restaurant Modeler
    • Pythonic Resources
    • Projects
  • Machine Learning
    • Fundamentals
      • Supervised Learning
        • Classification Algorithms
          • k-Nearest Neighbors
            • kNN Parameters & Attributes
          • Logistic Regression
        • Classification Report
      • UnSupervised Learning
        • Clustering
          • Evaluation
      • Preprocessing
        • Scalers: Standard vs MinMax
        • Feature Selection vs Dimensionality Reduction
        • Encoding
    • Frameworks
    • Machine Learning in Advertising
    • Natural Language Processing
      • Stopwords
      • Name Entity Recognition (NER)
      • Sentiment Analysis
        • Agoda Reviews - Part I - Scraping Reviews, Detecting Languages, and Preprocessing
        • Agoda Reviews - Part II - Sentiment Analysis and WordClouds
    • Recommendation Systems
      • Spotify Recommender System - Artists
  • Geospatial Analysis
    • Geospatial Analysis Basics
    • GSA at Work
      • Web Scraping and Mapping
  • GIT
    • GIT Essentials
    • Connecting to GitHub
  • FAQ
    • Statistics
  • Cloud Computing
    • Introduction to Cloud Computing
    • Google Cloud Platform
  • Docker
    • What is Docker?
Powered by GitBook
On this page
  • 1. Normal (Gaussian) Distribution
  • 2. Binomial Distribution:
  • 3. Poisson Distribution
  • 4. Exponential Distribution
  • 5. Pareto Distribution (Power Law)
  • 6. Lognormal Distribution
  • 7. Weibull Distribution:

Was this helpful?

  1. STATISTICS

Distributions

Last updated 1 year ago

Was this helpful?

The following are some of the most common distributions one can encounter in a business/e-commerce business setup. The conditions that will be defined for each distribution are not always strictly necessary, but they are generally required for the distributions to be applicable and for the parameters to be interpretable.

1. Normal (Gaussian) Distribution

The Normal Distribution, also known as the Gaussian Distribution, is a continuous distribution that is widely used to model real-valued variables that are symmetric and bell-shaped. It is defined by two parameters: the mean (μ) and the standard deviation (σ). The following business metrics follow a Normal Distribution:

  • Sales revenue

  • Customer lifetime value (CLV)

  • Average order value (AOV)

  • Product prices

  • Customer demographics (age, income, etc.)

Apart from the business metrics, the Normal Distribution is also commonly used to model variables such as heights, weights, IQ scores, and stock prices, where the majority of the data points cluster around the mean, with fewer extreme values.

Conditions that need to be met:

  1. Independence: Each data point is independent of the others.

  2. Identical Distribution: Each data point comes from the same distribution.

  3. Mean and Variance: The mean (μ) and variance (σ^2) are finite and constant.

  4. Symmetry: The distribution is symmetric around the mean.

  5. Bell-shaped: The distribution has a bell-shaped curve.

Wikipedia Link:

2. Binomial Distribution:

The Binomial Distribution is a discrete distribution that models the number of successes in a fixed number of trials, such as the number of conversions in a fixed number of website visits. It is defined by two parameters: the probability of success (p) and the number of trials (n). The Binomial Distribution is commonly used to model binary outcomes, such as 0/1, yes/no, or success/failure. The following business metrics follow a Binomial Distribution:

  • Conversion rates (e.g., click-through rates, checkout rates)

  • Customer churn rates

  • Product ratings (e.g., 1-5 stars)

Conditions that need to be met:

  1. Independence: Each trial is independent of the others.

  2. Fixed Number of Trials: The number of trials (n) is fixed.

  3. Constant Probability: The probability of success (p) is constant for each trial.

  4. Binary Outcomes: Each trial has only two possible outcomes (success or failure).

3. Poisson Distribution

The Poisson Distribution is a discrete distribution that models the number of events that occur in a fixed interval, such as the number of website visits, orders, or phone calls. It is defined by a single parameter, lambda (λ), which represents the average rate of events. The Poisson Distribution assumes that events are independent and occur at a constant rate, making it a useful model for counting data. The following business metrics follow a Poisson Distribution:

  • Number of orders per customer

  • Number of items per order

  • Website traffic (visits, page views, etc.)

  • Customer complaints or returns

Conditions that need to be met:

  1. Independence: Each event is independent of the others.

  2. Constant Rate: The average rate of events (λ) is constant over the interval.

  3. Fixed Interval: The events occur in a fixed interval of time or space.

  4. Rare Events: The probability of an event occurring in a small interval is small.

4. Exponential Distribution

The Exponential Distribution is a continuous distribution that models the time between events in a Poisson process, such as the time between website visits or customer loyalty. It is defined by a single parameter, lambda (λ), which represents the rate at which events occur. The Exponential Distribution is memoryless, meaning that the time between events does not depend on the time since the last event. The following business metrics follow a Exponential Distribution:

  • Time between orders

  • Time spent on the website

  • Customer loyalty (repeat business)

Conditions that need to be met:

  1. Independence: Each event is independent of the others.

  2. Constant Rate: The average rate of events (λ) is constant over time.

  3. Memoryless: The time between events does not depend on the time since the last event.

  4. Continuous Time: The events occur in continuous time.

5. Pareto Distribution (Power Law)

The Pareto Distribution, also known as the Power Law Distribution, is a continuous distribution that models the distribution of values with a long tail, such as customer value or product popularity. It is defined by a single parameter, alpha (α), which represents the shape of the distribution. The Pareto Distribution is commonly used to model variables that follow the 80/20 rule, where most values are small, with a few extreme values. The following business metrics follow a Pareto Distribution:

  • Customer value (80/20 rule: 20% of customers generate 80% of revenue)

  • Product popularity (80/20 rule: 20% of products generate 80% of sales)

Conditions that need to be met:

  1. Heavy-tailed: The distribution has a heavy tail, meaning that extreme values are more common than in a normal distribution.

  2. Positive Values: The values are positive.

  3. Scale Invariance: The distribution is scale-invariant, meaning that it looks the same at different scales.

  4. Alpha Parameter: The shape parameter (α) is greater than 1.

6. Lognormal Distribution

The Lognormal Distribution is a continuous distribution that models the distribution of values with a long tail and positive skew, such as customer lifetime value or stock prices. It is defined by two parameters: mu (μ) and sigma (σ), which represent the shape and scale of the distribution. The Lognormal Distribution is commonly used to model variables that have a natural lower bound of zero, such as prices or values.

  • Customer lifetime value (CLV) with a long tail

  • Stock prices (if you're selling stocks or cryptocurrencies)

Conditions that need to be met:

  1. Positive Values: The values are positive.

  2. Skewed Distribution: The distribution is skewed to the right.

  3. Mean and Variance: The mean (μ) and variance (σ^2) are finite and constant.

  4. Logarithmic Transformation: The logarithm of the values is normally distributed.

7. Weibull Distribution:

The Weibull Distribution is a continuous distribution that models the distribution of lifetimes or survival times, such as product lifetimes or customer retention. It is defined by two parameters: alpha (α) and beta (β), which represent the shape and scale of the distribution. The Weibull Distribution is commonly used to model variables that have a natural lower bound of zero, such as lifetimes or survival times.

  • Product lifetimes (e.g., warranty claims, returns)

  • Customer retention rates

Conditions that need to be met:

  1. Positive Values: The values are positive.

  2. Skewed Distribution: The distribution is skewed to the right.

  3. Shape Parameter: The shape parameter (α) is greater than 0.

  4. Scale Parameter: The scale parameter (β) is greater than 0.

Note that these distributions are not exhaustive and may vary depending on specific e-commerce business and data. Understanding these distributions will help us model and analyze our data more effectively, making informed business decisions easier.

Wikipedia Link:

Wikipedia Link:

Wikipedia Link:

Wikipedia Link:

Wikipedia Link:

Wikipedia Link:

https://en.wikipedia.org/wiki/Normal_distribution
https://en.wikipedia.org/wiki/Binomial_distribution
https://en.wikipedia.org/wiki/Poisson_distribution
https://en.wikipedia.org/wiki/Exponential_distribution
https://en.wikipedia.org/wiki/Pareto_distribution
https://en.wikipedia.org/wiki/Lognormal_distribution
https://en.wikipedia.org/wiki/Weibull_distribution
Page cover image