Data Science Hub

CtrlK

Page cover

Distributions

The following are some of the most common distributions one can encounter in a business/e-commerce business setup. The conditions that will be defined for each distribution are not always strictly necessary, but they are generally required for the distributions to be applicable and for the parameters to be interpretable.

1. Normal (Gaussian) Distribution

The Normal Distribution, also known as the Gaussian Distribution, is a continuous distribution that is widely used to model real-valued variables that are symmetric and bell-shaped. It is defined by two parameters: the mean (μ) and the standard deviation (σ). The following business metrics follow a Normal Distribution:

Sales revenue
Customer lifetime value (CLV)
Average order value (AOV)
Product prices
Customer demographics (age, income, etc.)

Apart from the business metrics, the Normal Distribution is also commonly used to model variables such as heights, weights, IQ scores, and stock prices, where the majority of the data points cluster around the mean, with fewer extreme values.

Conditions that need to be met:

Independence: Each data point is independent of the others.
Identical Distribution: Each data point comes from the same distribution.
Mean and Variance: The mean (μ) and variance (σ^2) are finite and constant.
Symmetry: The distribution is symmetric around the mean.
Bell-shaped: The distribution has a bell-shaped curve.

Wikipedia Link: https://en.wikipedia.org/wiki/Normal_distribution

2. Binomial Distribution:

The Binomial Distribution is a discrete distribution that models the number of successes in a fixed number of trials, such as the number of conversions in a fixed number of website visits. It is defined by two parameters: the probability of success (p) and the number of trials (n). The Binomial Distribution is commonly used to model binary outcomes, such as 0/1, yes/no, or success/failure. The following business metrics follow a Binomial Distribution:

Conversion rates (e.g., click-through rates, checkout rates)
Customer churn rates
Product ratings (e.g., 1-5 stars)

Conditions that need to be met:

Independence: Each trial is independent of the others.
Fixed Number of Trials: The number of trials (n) is fixed.
Constant Probability: The probability of success (p) is constant for each trial.
Binary Outcomes: Each trial has only two possible outcomes (success or failure).

Wikipedia Link: https://en.wikipedia.org/wiki/Binomial_distribution

3. Poisson Distribution

The Poisson Distribution is a discrete distribution that models the number of events that occur in a fixed interval, such as the number of website visits, orders, or phone calls. It is defined by a single parameter, lambda (λ), which represents the average rate of events. The Poisson Distribution assumes that events are independent and occur at a constant rate, making it a useful model for counting data. The following business metrics follow a Poisson Distribution:

Number of orders per customer
Number of items per order
Website traffic (visits, page views, etc.)
Customer complaints or returns

Conditions that need to be met:

Independence: Each event is independent of the others.
Constant Rate: The average rate of events (λ) is constant over the interval.
Fixed Interval: The events occur in a fixed interval of time or space.
Rare Events: The probability of an event occurring in a small interval is small.

Wikipedia Link: https://en.wikipedia.org/wiki/Poisson_distribution

4. Exponential Distribution

The Exponential Distribution is a continuous distribution that models the time between events in a Poisson process, such as the time between website visits or customer loyalty. It is defined by a single parameter, lambda (λ), which represents the rate at which events occur. The Exponential Distribution is memoryless, meaning that the time between events does not depend on the time since the last event. The following business metrics follow a Exponential Distribution:

Time between orders
Time spent on the website
Customer loyalty (repeat business)

Conditions that need to be met:

Independence: Each event is independent of the others.
Constant Rate: The average rate of events (λ) is constant over time.
Memoryless: The time between events does not depend on the time since the last event.
Continuous Time: The events occur in continuous time.

Wikipedia Link: https://en.wikipedia.org/wiki/Exponential_distribution

5. Pareto Distribution (Power Law)

The Pareto Distribution, also known as the Power Law Distribution, is a continuous distribution that models the distribution of values with a long tail, such as customer value or product popularity. It is defined by a single parameter, alpha (α), which represents the shape of the distribution. The Pareto Distribution is commonly used to model variables that follow the 80/20 rule, where most values are small, with a few extreme values. The following business metrics follow a Pareto Distribution:

Customer value (80/20 rule: 20% of customers generate 80% of revenue)
Product popularity (80/20 rule: 20% of products generate 80% of sales)

Conditions that need to be met:

Heavy-tailed: The distribution has a heavy tail, meaning that extreme values are more common than in a normal distribution.
Positive Values: The values are positive.
Scale Invariance: The distribution is scale-invariant, meaning that it looks the same at different scales.
Alpha Parameter: The shape parameter (α) is greater than 1.

Wikipedia Link: https://en.wikipedia.org/wiki/Pareto_distribution

6. Lognormal Distribution

The Lognormal Distribution is a continuous distribution that models the distribution of values with a long tail and positive skew, such as customer lifetime value or stock prices. It is defined by two parameters: mu (μ) and sigma (σ), which represent the shape and scale of the distribution. The Lognormal Distribution is commonly used to model variables that have a natural lower bound of zero, such as prices or values.

Customer lifetime value (CLV) with a long tail
Stock prices (if you're selling stocks or cryptocurrencies)

Conditions that need to be met:

Positive Values: The values are positive.
Skewed Distribution: The distribution is skewed to the right.
Mean and Variance: The mean (μ) and variance (σ^2) are finite and constant.
Logarithmic Transformation: The logarithm of the values is normally distributed.

Wikipedia Link: https://en.wikipedia.org/wiki/Lognormal_distribution

7. Weibull Distribution:

The Weibull Distribution is a continuous distribution that models the distribution of lifetimes or survival times, such as product lifetimes or customer retention. It is defined by two parameters: alpha (α) and beta (β), which represent the shape and scale of the distribution. The Weibull Distribution is commonly used to model variables that have a natural lower bound of zero, such as lifetimes or survival times.

Product lifetimes (e.g., warranty claims, returns)
Customer retention rates

Conditions that need to be met:

Positive Values: The values are positive.
Skewed Distribution: The distribution is skewed to the right.
Shape Parameter: The shape parameter (α) is greater than 0.
Scale Parameter: The scale parameter (β) is greater than 0.

Wikipedia Link: https://en.wikipedia.org/wiki/Weibull_distribution

Note that these distributions are not exhaustive and may vary depending on specific e-commerce business and data. Understanding these distributions will help us model and analyze our data more effectively, making informed business decisions easier.

Last updated 1 year ago

Was this helpful?