Data Science Hub
  • Data Science Hub
  • STATISTICS
    • Introduction
    • Fundamentals
      • Data Types
      • Central Tendency, Asymmetry, and Variability
      • Sampling
      • Confidence Interval
      • Hypothesis Testing
    • Distributions
      • Exponential Distribution
    • A/B Testing
      • Sample Size Calculation
      • Multiple Testing
  • Database
    • Database Fundamentals
    • Database Management Systems
    • Data Warehouse vs Data Lake
  • SQL
    • SQL Basics
      • Creating and Modifying Tables/Views
      • Data Types
      • Joins
    • SQL Rules
    • SQL Aggregate Functions
    • SQL Window Functions
    • SQL Data Manipulation
      • String Operations
      • Date/Time Operations
    • SQL Descriptive Stats
    • SQL Tips
    • SQL Performance Tuning
    • SQL Customization
    • SQL Practice
      • Designing Databases
        • Spotify Database Design
      • Most Commonly Asked
      • Mixed Queries
      • Popular Websites For SQL Practice
        • SQLZoo
          • World - BBC Tables
            • SUM and COUNT Tutorial
            • SELECT within SELECT Tutorial
            • SELECT from WORLD Tutorial
            • Select Quiz
            • BBC QUIZ
            • Nested SELECT Quiz
            • SUM and COUNT Quiz
          • Nobel Table
            • SELECT from Nobel Tutorial
            • Nobel Quiz
          • Soccer / Football Tables
            • JOIN Tutorial
            • JOIN Quiz
          • Movie / Actor / Casting Tables
            • More JOIN Operations Tutorial
            • JOIN Quiz 2
          • Teacher - Dept Tables
            • Using Null Quiz
          • Edinburgh Buses Table
            • Self join Quiz
        • HackerRank
          • SQL (Basic)
            • Select All
            • Select By ID
            • Japanese Cities' Attributes
            • Revising the Select Query I
            • Revising the Select Query II
            • Revising Aggregations - The Count Function
            • Revising Aggregations - The Sum Function
            • Revising Aggregations - Averages
            • Average Population
            • Japan Population
            • Population Density Difference
            • Population Census
            • African Cities
            • Average Population of Each Continent
            • Weather Observation Station 1
            • Weather Observation Station 2
            • Weather Observation Station 3
            • Weather Observation Station 4
            • Weather Observation Station 6
            • Weather Observation Station 7
            • Weather Observation Station 8
            • Weather Observation Station 9
            • Weather Observation Station 10
            • Weather Observation Station 11
            • Weather Observation Station 12
            • Weather Observation Station 13
            • Weather Observation Station 14
            • Weather Observation Station 15
            • Weather Observation Station 16
            • Weather Observation Station 17
            • Weather Observation Station 18
            • Weather Observation Station 19
            • Higher Than 75 Marks
            • Employee Names
            • Employee Salaries
            • The Blunder
            • Top Earners
            • Type of Triangle
            • The PADS
          • SQL (Intermediate)
            • Weather Observation Station 5
            • Weather Observation Station 20
            • New Companies
            • The Report
            • Top Competitors
            • Ollivander's Inventory
            • Challenges
            • Contest Leaderboard
            • SQL Project Planning
            • Placements
            • Symmetric Pairs
            • Binary Tree Nodes
            • Interviews
            • Occupations
          • SQL (Advanced)
            • Draw The Triangle 1
            • Draw The Triangle 2
            • Print Prime Numbers
            • 15 Days of Learning SQL
          • TABLES
            • City - Country
            • Station
            • Hackers - Submissions
            • Students
            • Employee - Employees
            • Occupations
            • Triangles
        • StrataScratch
          • Netflix
            • Oscar Nominees Table
            • Nominee Filmography Table
            • Nominee Information Table
          • Audible
            • Easy - Audible
          • Spotify
            • Worldwide Daily Song Ranking Table
            • Billboard Top 100 Year End Table
            • Daily Rankings 2017 US
          • Google
            • Easy - Google
            • Medium - Google
            • Hard - Google
        • LeetCode
          • Easy
  • Python
    • Basics
      • Variables and DataTypes
        • Lists
        • Dictionaries
      • Control Flow
      • Functions
    • Object Oriented Programming
      • Restaurant Modeler
    • Pythonic Resources
    • Projects
  • Machine Learning
    • Fundamentals
      • Supervised Learning
        • Classification Algorithms
          • k-Nearest Neighbors
            • kNN Parameters & Attributes
          • Logistic Regression
        • Classification Report
      • UnSupervised Learning
        • Clustering
          • Evaluation
      • Preprocessing
        • Scalers: Standard vs MinMax
        • Feature Selection vs Dimensionality Reduction
        • Encoding
    • Frameworks
    • Machine Learning in Advertising
    • Natural Language Processing
      • Stopwords
      • Name Entity Recognition (NER)
      • Sentiment Analysis
        • Agoda Reviews - Part I - Scraping Reviews, Detecting Languages, and Preprocessing
        • Agoda Reviews - Part II - Sentiment Analysis and WordClouds
    • Recommendation Systems
      • Spotify Recommender System - Artists
  • Geospatial Analysis
    • Geospatial Analysis Basics
    • GSA at Work
      • Web Scraping and Mapping
  • GIT
    • GIT Essentials
    • Connecting to GitHub
  • FAQ
    • Statistics
  • Cloud Computing
    • Introduction to Cloud Computing
    • Google Cloud Platform
  • Docker
    • What is Docker?
Powered by GitBook
On this page
  • What Are Aggregate Functions?
  • Ordered-Set Aggregate Functions
  • Usage and Benefits

Was this helpful?

  1. SQL

SQL Aggregate Functions

Last updated 7 months ago

Was this helpful?

SQL Aggregate Functions are essential tools for performing calculations on multiple rows of a table's column and returning a single value. These functions are fundamental in data analysis and reporting.

What Are Aggregate Functions?

Aggregate functions perform a calculation on a set of values and return a single value. They are often used in conjunction with the GROUP BY clause to group rows that have the same values in specified columns into aggregate data. Below are the most commonly used aggregate functions:

  • COUNT()

  • SUM()

  • AVG()

  • MIN()

  • MAX()

For the entire list of Aggregate functions in Postgres, please visit

Descriptions and Examples:

  1. COUNT()

The COUNT() function returns the number of input rows that match a specific condition. It is useful for determining the number of rows in a table or the number of non-NULL values in a column.

Example:

SELECT COUNT(*) AS total_rows
FROM orders;

This query returns the total number of rows in the orders table.

  1. SUM()

The SUM() function calculates the total sum of a numeric column. It is useful for adding up all the values in a column.

Example:

SELECT SUM(amount) AS total_sales
FROM orders;

This query returns the total sum of the amount column in the orders table.

  1. AVG()

The AVG() function calculates the average value of a numeric column. It is useful for finding the mean value of a set of numbers.

Example:

SELECT AVG(amount) AS average_sale
FROM orders;

This query returns the average value of the amount column in the orders table.

  1. MIN()

The MIN() function returns the smallest value in a column. It is useful for finding the minimum value in a set of values.

Example:

SELECT MIN(amount) AS smallest_sale
FROM orders;

This query returns the smallest value in the amount column in the orders table.

  1. MAX()

The MAX() function returns the largest value in a column. It is useful for finding the maximum value in a set of values.

Example:

SELECT MAX(amount) AS largest_sale
FROM orders;

This query returns the largest value in the amount column in the orders table.

Using Aggregate Functions with GROUP BY

Aggregate functions are often used with the GROUP BY clause to group rows that have the same values in specified columns into summary rows.

Example:

SELECT 
    customer_id, 
    COUNT(*) AS orders_count, 
    SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id;

This query groups the rows by customer_id and calculates the number of orders and the total amount spent by each customer.

HAVING Clause

The HAVING clause is used to filter groups based on a condition. It is similar to the WHERE clause, but WHERE cannot be used with aggregate functions.

Example:

SELECT 
    customer_id, 
    COUNT(*) AS orders_count, 
    SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 1000;

This query returns only those customers who have spent more than 1000 in total.

Ordered-Set Aggregate Functions

Key Features of Ordered-Set Aggregate Functions:

  • Order-Sensitive: These functions require the input values to be ordered.

  • Percentile Calculation: They are often used for calculating quantiles such as quartiles or percentiles and other statistical measures that depend on the rank or order of values.

  • Additional Parameters: They often take additional parameters, such as number of quantiles or the percentile rank.

Common Ordered-Set Aggregate Functions:

  • PERCENTILE_DISC

  • PERCENTILE_CONT

  • MODE

  • RANK and DISTRIBUTION Functions

  • PERCENT_RANK

Descriptions and Examples:

  1. PERCENTILE_DISC (Discrete Percentile)

Returns the value from the dataset that corresponds to the specified percentile.

Example:

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS median_disc
FROM sales;

This query returns the median value from the sales table, selecting an actual value from the dataset.

  1. PERCENTILE_CONT (Continuous Percentile)

Returns a value interpolated within the dataset for the specified percentile.

Example:

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_cont
FROM sales;

This query calculates the median value by interpolating between the values in the sales table.

  1. MODE

Returns the most frequently occurring value in the dataset.

Example:

SELECT MODE() WITHIN GROUP (ORDER BY amount) AS mode_amount
FROM sales;

This query returns the mode (most frequent value) of the amount column in the sales table.

  1. RANK and DISTRIBUTION Functions

Functions like RANK, DENSE_RANK, and CUME_DIST provide information about the rank and distribution of values within a partition.

Example:

SELECT amount,
       RANK() OVER (ORDER BY amount) AS rank,
       DENSE_RANK() OVER (ORDER BY amount) AS dense_rank,
       CUME_DIST() OVER (ORDER BY amount) AS cumulative_distribution
FROM sales;

This query assigns a rank, dense rank, and cumulative distribution value to each amount in the sales table.

  1. PERCENT_RANK (Continuous Percentile)

Calculates the relative rank of a row within a result set as a percentage of the result set.

The formula for PERCENT_RANK() is: perc_rank = ( rank - 1 ) / ( total_rows - 1 )

\text{PERCENT_RANK} = \frac{\text{Rank} - 1}{\text{Total Rows} - 1}

where the rank is the position of the row in the ordered set, starting from 1.

Example:

SELECT
    amount,
    PERCENT_RANK() OVER (ORDER BY amount) AS percent_rank
FROM sales;

This query calculates the rank of each row as a percentage of the total number of rows in the sales table.

Result
amount	percent_rank
--------------------
10.0	0.000000
10.0	0.000000
20.0	0.222222
20.0	0.222222
30.0	0.444444
30.0	0.444444
40.0	0.666667
50.0	0.888889
50.0	0.888889
60.0	1.000000

Usage and Benefits

  • Performance: Using these functions can improve performance by leveraging database capabilities for complex calculations, reducing the need for extensive client-side processing.

Ordered-set aggregate functions in SQL are a special class of aggregate functions that operate on a set of values and take into account the order of those values. Unlike traditional aggregate functions that treat the input as an unordered set, ordered-set aggregates consider the input sequence, which is crucial for certain and analytical calculations.

Statistical Analysis: Ordered-set aggregate functions are ideal for , where the order of data points is crucial.

Data Summarization: They help summarize data in meaningful ways, such as finding .

https://www.postgresql.org/docs/9.4/functions-aggregate.html
statistical
statistical analysis
medians, modes, and percentiles