# SQL Aggregate Functions

SQL Aggregate Functions are essential tools for performing calculations on multiple rows of a table's column and returning a single value. These functions are fundamental in data analysis and reporting.&#x20;

## **What Are Aggregate Functions?**

Aggregate functions perform a calculation on a set of values and return a single value. They are often used in conjunction with the `GROUP BY` clause to group rows that have the same values in specified columns into aggregate data. Below are the most commonly used aggregate functions:

* **COUNT()**
* **SUM()**
* **AVG()**
* **MIN()**
* **MAX()**

For the entire list of Aggregate functions in Postgres, please visit <https://www.postgresql.org/docs/9.4/functions-aggregate.html>

**Descriptions and Examples:**

1. **COUNT()**

The `COUNT()` function returns the number of input rows that match a specific condition. It is useful for determining the number of rows in a table or the number of non-NULL values in a column.

**Example:**

```sql
SELECT COUNT(*) AS total_rows
FROM orders;
```

This query returns the total number of rows in the `orders` table.

2. **SUM()**

The `SUM()` function calculates the total sum of a numeric column. It is useful for adding up all the values in a column.

**Example:**

```sql
SELECT SUM(amount) AS total_sales
FROM orders;
```

This query returns the total sum of the `amount` column in the `orders` table.

3. **AVG()**

The `AVG()` function calculates the average value of a numeric column. It is useful for finding the mean value of a set of numbers.

**Example:**

```sql
SELECT AVG(amount) AS average_sale
FROM orders;
```

This query returns the average value of the `amount` column in the `orders` table.

4. **MIN()**

The `MIN()` function returns the smallest value in a column. It is useful for finding the minimum value in a set of values.

**Example:**

```sql
SELECT MIN(amount) AS smallest_sale
FROM orders;
```

This query returns the smallest value in the `amount` column in the `orders` table.

5. **MAX()**

The `MAX()` function returns the largest value in a column. It is useful for finding the maximum value in a set of values.

**Example:**

```sql
SELECT MAX(amount) AS largest_sale
FROM orders;
```

This query returns the largest value in the `amount` column in the `orders` table.

**Using Aggregate Functions with GROUP BY**

Aggregate functions are often used with the `GROUP BY` clause to group rows that have the same values in specified columns into summary rows.

**Example:**

```sql
SELECT 
    customer_id, 
    COUNT(*) AS orders_count, 
    SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id;
```

This query groups the rows by `customer_id` and calculates the number of orders and the total amount spent by each customer.

**HAVING Clause**

The `HAVING` clause is used to filter groups based on a condition. It is similar to the `WHERE` clause, but `WHERE` cannot be used with aggregate functions.

**Example:**

```sql
SELECT 
    customer_id, 
    COUNT(*) AS orders_count, 
    SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 1000;
```

This query returns only those customers who have spent more than 1000 in total.

## Ordered-Set Aggregate Functions

Ordered-set aggregate functions in SQL are a special class of aggregate functions that operate on a set of values and take into account the order of those values. Unlike traditional aggregate functions that treat the input as an unordered set, ordered-set aggregates consider the input sequence, which is crucial for certain [statistical](https://dshub.gitbook.io/ds-hub/sql/sql-descriptive-stats) and analytical calculations.

Key Features of Ordered-Set Aggregate Functions:

* **Order-Sensitive**: These functions require the input values to be ordered.
* **Percentile Calculation**: They are often used for calculating quantiles such as quartiles or percentiles and other statistical measures that depend on the rank or order of values.
* **Additional Parameters**: They often take additional parameters, such as number of quantiles or the percentile rank.

Common Ordered-Set Aggregate Functions:

* **PERCENTILE\_DISC**
* **PERCENTILE\_CONT**
* **MODE**
* **RANK and DISTRIBUTION Functions**
* **PERCENT\_RANK**

**Descriptions and Examples:**

1. **PERCENTILE\_DISC (Discrete Percentile)**

Returns the value from the dataset that corresponds to the specified percentile.

**Example**:

```sql
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS median_disc
FROM sales;
```

This query returns the median value from the `sales` table, selecting an actual value from the dataset.

2. **PERCENTILE\_CONT (Continuous Percentile)**

Returns a value interpolated within the dataset for the specified percentile.

**Example**:

```sql
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_cont
FROM sales;
```

This query calculates the median value by interpolating between the values in the `sales` table.

3. **MODE**

Returns the most frequently occurring value in the dataset.

**Example**:

```sql
SELECT MODE() WITHIN GROUP (ORDER BY amount) AS mode_amount
FROM sales;
```

This query returns the mode (most frequent value) of the `amount` column in the `sales` table.

4. **RANK and DISTRIBUTION Functions**

Functions like `RANK`, `DENSE_RANK`, and `CUME_DIST` provide information about the rank and distribution of values within a partition.

**Example**:

```sql
SELECT amount,
       RANK() OVER (ORDER BY amount) AS rank,
       DENSE_RANK() OVER (ORDER BY amount) AS dense_rank,
       CUME_DIST() OVER (ORDER BY amount) AS cumulative_distribution
FROM sales;
```

This query assigns a rank, dense rank, and cumulative distribution value to each `amount` in the `sales` table.

5. **PERCENT\_RANK (Continuous Percentile)**

Calculates the relative rank of a row within a result set as a percentage of the result set.&#x20;

The formula for `PERCENT_RANK()` is:  perc\_rank = ( rank - 1 ) / ( total\_rows - 1 )

```latex
\text{PERCENT_RANK} = \frac{\text{Rank} - 1}{\text{Total Rows} - 1}
```

where the rank is the position of the row in the ordered set, starting from 1.

**Example**:

```sql
SELECT
    amount,
    PERCENT_RANK() OVER (ORDER BY amount) AS percent_rank
FROM sales;
```

This query calculates the rank of each row as a percentage of the total number of rows in the `sales` table.

<details>

<summary>Result</summary>

```markdown
amount	percent_rank
--------------------
10.0	0.000000
10.0	0.000000
20.0	0.222222
20.0	0.222222
30.0	0.444444
30.0	0.444444
40.0	0.666667
50.0	0.888889
50.0	0.888889
60.0	1.000000
```

</details>

## **Usage and Benefits**

* **Statistical Analysis**: Ordered-set aggregate functions are ideal for [statistical analysis](https://dshub.gitbook.io/ds-hub/sql/sql-descriptive-stats), where the order of data points is crucial.
* **Data Summarization**: They help summarize data in meaningful ways, such as finding [medians, modes, and percentiles](#descriptions-and-examples).
* **Performance**: Using these functions can improve performance by leveraging database capabilities for complex calculations, reducing the need for extensive client-side processing.
