Data Science Hub
  • Data Science Hub
  • STATISTICS
    • Introduction
    • Fundamentals
      • Data Types
      • Central Tendency, Asymmetry, and Variability
      • Sampling
      • Confidence Interval
      • Hypothesis Testing
    • Distributions
      • Exponential Distribution
    • A/B Testing
      • Sample Size Calculation
      • Multiple Testing
  • Database
    • Database Fundamentals
    • Database Management Systems
    • Data Warehouse vs Data Lake
  • SQL
    • SQL Basics
      • Creating and Modifying Tables/Views
      • Data Types
      • Joins
    • SQL Rules
    • SQL Aggregate Functions
    • SQL Window Functions
    • SQL Data Manipulation
      • String Operations
      • Date/Time Operations
    • SQL Descriptive Stats
    • SQL Tips
    • SQL Performance Tuning
    • SQL Customization
    • SQL Practice
      • Designing Databases
        • Spotify Database Design
      • Most Commonly Asked
      • Mixed Queries
      • Popular Websites For SQL Practice
        • SQLZoo
          • World - BBC Tables
            • SUM and COUNT Tutorial
            • SELECT within SELECT Tutorial
            • SELECT from WORLD Tutorial
            • Select Quiz
            • BBC QUIZ
            • Nested SELECT Quiz
            • SUM and COUNT Quiz
          • Nobel Table
            • SELECT from Nobel Tutorial
            • Nobel Quiz
          • Soccer / Football Tables
            • JOIN Tutorial
            • JOIN Quiz
          • Movie / Actor / Casting Tables
            • More JOIN Operations Tutorial
            • JOIN Quiz 2
          • Teacher - Dept Tables
            • Using Null Quiz
          • Edinburgh Buses Table
            • Self join Quiz
        • HackerRank
          • SQL (Basic)
            • Select All
            • Select By ID
            • Japanese Cities' Attributes
            • Revising the Select Query I
            • Revising the Select Query II
            • Revising Aggregations - The Count Function
            • Revising Aggregations - The Sum Function
            • Revising Aggregations - Averages
            • Average Population
            • Japan Population
            • Population Density Difference
            • Population Census
            • African Cities
            • Average Population of Each Continent
            • Weather Observation Station 1
            • Weather Observation Station 2
            • Weather Observation Station 3
            • Weather Observation Station 4
            • Weather Observation Station 6
            • Weather Observation Station 7
            • Weather Observation Station 8
            • Weather Observation Station 9
            • Weather Observation Station 10
            • Weather Observation Station 11
            • Weather Observation Station 12
            • Weather Observation Station 13
            • Weather Observation Station 14
            • Weather Observation Station 15
            • Weather Observation Station 16
            • Weather Observation Station 17
            • Weather Observation Station 18
            • Weather Observation Station 19
            • Higher Than 75 Marks
            • Employee Names
            • Employee Salaries
            • The Blunder
            • Top Earners
            • Type of Triangle
            • The PADS
          • SQL (Intermediate)
            • Weather Observation Station 5
            • Weather Observation Station 20
            • New Companies
            • The Report
            • Top Competitors
            • Ollivander's Inventory
            • Challenges
            • Contest Leaderboard
            • SQL Project Planning
            • Placements
            • Symmetric Pairs
            • Binary Tree Nodes
            • Interviews
            • Occupations
          • SQL (Advanced)
            • Draw The Triangle 1
            • Draw The Triangle 2
            • Print Prime Numbers
            • 15 Days of Learning SQL
          • TABLES
            • City - Country
            • Station
            • Hackers - Submissions
            • Students
            • Employee - Employees
            • Occupations
            • Triangles
        • StrataScratch
          • Netflix
            • Oscar Nominees Table
            • Nominee Filmography Table
            • Nominee Information Table
          • Audible
            • Easy - Audible
          • Spotify
            • Worldwide Daily Song Ranking Table
            • Billboard Top 100 Year End Table
            • Daily Rankings 2017 US
          • Google
            • Easy - Google
            • Medium - Google
            • Hard - Google
        • LeetCode
          • Easy
  • Python
    • Basics
      • Variables and DataTypes
        • Lists
        • Dictionaries
      • Control Flow
      • Functions
    • Object Oriented Programming
      • Restaurant Modeler
    • Pythonic Resources
    • Projects
  • Machine Learning
    • Fundamentals
      • Supervised Learning
        • Classification Algorithms
          • k-Nearest Neighbors
            • kNN Parameters & Attributes
          • Logistic Regression
        • Classification Report
      • UnSupervised Learning
        • Clustering
          • Evaluation
      • Preprocessing
        • Scalers: Standard vs MinMax
        • Feature Selection vs Dimensionality Reduction
        • Encoding
    • Frameworks
    • Machine Learning in Advertising
    • Natural Language Processing
      • Stopwords
      • Name Entity Recognition (NER)
      • Sentiment Analysis
        • Agoda Reviews - Part I - Scraping Reviews, Detecting Languages, and Preprocessing
        • Agoda Reviews - Part II - Sentiment Analysis and WordClouds
    • Recommendation Systems
      • Spotify Recommender System - Artists
  • Geospatial Analysis
    • Geospatial Analysis Basics
    • GSA at Work
      • Web Scraping and Mapping
  • GIT
    • GIT Essentials
    • Connecting to GitHub
  • FAQ
    • Statistics
  • Cloud Computing
    • Introduction to Cloud Computing
    • Google Cloud Platform
  • Docker
    • What is Docker?
Powered by GitBook
On this page
  • Understanding Descriptive Statistics
  • Setting Up Your Data
  • Combining Descriptive Statistics
  • Conclusion

Was this helpful?

  1. SQL

SQL Descriptive Stats

Last updated 10 months ago

Was this helpful?

Descriptive statistics provide a way to summarize and understand the main characteristics of a data set. PostgreSQL offers several functions that help perform descriptive statistical analysis directly within SQL. Some of these functions are PERCENTILE_DISC, PERCENTILE_CONT, and MODE, along with more common statistical/ like AVG, SUM, MIN, and MAX.

Understanding Descriptive Statistics

Descriptive statistics describe the main features of a collection of data quantitatively. They are used to summarize data sets, and they include measures such as:

  1. Count - number of rows

  2. Mean (Average) - average value of a numeric column

  3. Sum - total of a numeric column

  4. Minimum and Maximum Values - smallest and largest value of the selected column

  5. Percentiles - are measures that divide a dataset into 100 equal parts

  6. Mode - value that appears most frequently

Setting Up Your Data

Let's consider a simple table named sales that contains sales data:

CREATE TABLE sales (
    id SERIAL PRIMARY KEY,
    amount NUMERIC
);

INSERT INTO sales (amount) VALUES
(10.0), (20.0), (10.0), (50.0), (30.0), (20.0), (40.0), (50.0), (30.0), (60.0);

Number of Rows

To COUNT() function yields the number of rows in a table.

Example:

SELECT COUNT(*) AS total_rows
FROM sales;

This query returns the total number of rows in the sales table.

Mean (Average)

The AVG() function calculates the mean of a numeric column.

Example:

SELECT AVG(amount) AS average_sale
FROM sales;

This query returns the average value of the amount column in the sales table.

Sum

The SUM() function calculates the total sum of a numeric column.

Example:

SELECT SUM(amount) AS total_sales
FROM sales;

This query returns the total sum of the amount column in the sales table.

Minimum and Maximum Values

The MIN() and MAX() functions return the smallest and largest values in a column, respectively.

Example:

SELECT MIN(amount) AS smallest_sale, MAX(amount) AS largest_sale
FROM sales;

This query returns the smallest and largest values in the amount column in the sales table.

Percentiles

Percentiles are measures that divide a dataset into 100 equal parts. PostgreSQL provides two functions for calculating percentiles: PERCENTILE_DISC and PERCENTILE_CONT.

  • PERCENTILE_DISC: Discrete percentile calculation.

  • PERCENTILE_CONT: Continuous percentile calculation.

Both functions are used with the WITHIN GROUP clause.

Example:

SELECT
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS median_disc,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_cont
FROM sales;

This query calculates the median (50th percentile) of the amount column using both discrete and continuous methods.

The PERCENTILE_DISC() function returns a value from the input dataset that is the closest to the percentile requested. The value returned will actually exist in the set.

The PERCENTILE_CONT() function returns an interpolated value between multiple values based on the distribution. The value returned may or may not exist in the set.

When to Use Which?

  • PERCENTILE_DISC: Use when the exact value from the dataset is important, such as when working with categorical data or when you need an actual observation.

  • PERCENTILE_CONT: Use when a more precise value is needed, such as when working with continuous data, and the percentile may not correspond directly to an actual observation in the dataset.

Mode

The mode is the value that appears most frequently in a dataset. Similar to percentile functions, the MODE() function is also used with the WITHIN GROUP clause.

Example:

SELECT MODE() WITHIN GROUP (ORDER BY amount) AS mode_amount
FROM sales;

This query returns the mode of the amount column in the sales table.

Combining Descriptive Statistics

You can combine multiple descriptive statistics in a single query to get a comprehensive summary of your data.

Example:

SELECT
    AVG(amount) AS average_sale,
    SUM(amount) AS total_sales,
    MIN(amount) AS smallest_sale,
    MAX(amount) AS largest_sale,
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS median_disc,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_cont,
    MODE() WITHIN GROUP (ORDER BY amount) AS mode_amount
FROM sales;

This query provides a complete summary of the amount column, including the average, total sum, minimum, maximum, median (both discrete and continuous), and mode.

Conclusion

Descriptive statistics in PostgreSQL can be efficiently performed using built-in SQL functions. These functions help you summarize and understand your data directly within the database. By utilizing functions like AVG, SUM, MIN, MAX, PERCENTILE_DISC, PERCENTILE_CONT, and MODE, you can perform a comprehensive statistical analysis of your data sets. Understanding and applying these functions will enhance your data analysis capabilities in PostgreSQL.

aggregate functions