# SQL Descriptive Stats

Descriptive statistics provide a way to summarize and understand the main characteristics of a data set. PostgreSQL offers several functions that help perform descriptive statistical analysis directly within SQL. Some of these functions are `PERCENTILE_DISC`, `PERCENTILE_CONT`, and `MODE`, along with more common statistical/[aggregate functions](/ds-hub/sql/sql-aggregate-functions.md) like `AVG`, `SUM`, `MIN`, and `MAX`.

## **Understanding Descriptive Statistics**

Descriptive statistics describe the main features of a collection of data quantitatively. They are used to summarize data sets, and they include measures such as:

1. **Count -** number of rows
2. **Mean** (Average) - average value of a numeric column
3. **Sum -** total of a numeric column
4. **Minimum and Maximum Values -** smallest and largest value of the selected column
5. **Percentiles -** are measures that divide a dataset into 100 equal parts
6. **Mode - v**alue that appears most frequently

### **Setting Up Your Data**

Let's consider a simple table named `sales` that contains sales data:

```sql
CREATE TABLE sales (
    id SERIAL PRIMARY KEY,
    amount NUMERIC
);

INSERT INTO sales (amount) VALUES
(10.0), (20.0), (10.0), (50.0), (30.0), (20.0), (40.0), (50.0), (30.0), (60.0);
```

**Number of Rows**

To `COUNT()` function yields the number of rows in a table.&#x20;

**Example:**

```sql
SELECT COUNT(*) AS total_rows
FROM sales;
```

This query returns the total number of rows in the `sales` table.

**Mean (Average)**

The `AVG()` function calculates the mean of a numeric column.

**Example:**

```sql
SELECT AVG(amount) AS average_sale
FROM sales;
```

This query returns the average value of the `amount` column in the `sales` table.

**Sum**

The `SUM()` function calculates the total sum of a numeric column.

**Example:**

```sql
SELECT SUM(amount) AS total_sales
FROM sales;
```

This query returns the total sum of the `amount` column in the `sales` table.

**Minimum and Maximum Values**

The `MIN()` and `MAX()` functions return the smallest and largest values in a column, respectively.

**Example:**

```sql
SELECT MIN(amount) AS smallest_sale, MAX(amount) AS largest_sale
FROM sales;
```

This query returns the smallest and largest values in the `amount` column in the `sales` table.

**Percentiles**

Percentiles are measures that divide a dataset into 100 equal parts. PostgreSQL provides two functions for calculating percentiles: `PERCENTILE_DISC` and `PERCENTILE_CONT`.

* **PERCENTILE\_DISC**: Discrete percentile calculation.
* **PERCENTILE\_CONT**: Continuous percentile calculation.

Both functions are used with the `WITHIN GROUP` clause.

**Example:**

```sql
SELECT
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS median_disc,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_cont
FROM sales;
```

This query calculates the median (50th percentile) of the `amount` column using both discrete and continuous methods.

{% hint style="info" %}
The `PERCENTILE_DISC()` function returns a value from the input dataset that is the closest to the percentile requested. The value returned will actually exist in the set.&#x20;

The `PERCENTILE_CONT()` function returns an interpolated value between multiple values based on the distribution. The value returned may or may not exist in the set.&#x20;
{% endhint %}

When to Use Which?

* **PERCENTILE\_DISC**: Use when the exact value from the dataset is important, such as when working with categorical data or when you need an actual observation.
* **PERCENTILE\_CONT**: Use when a more precise value is needed, such as when working with continuous data, and the percentile may not correspond directly to an actual observation in the dataset.

**Mode**

The mode is the value that appears most frequently in a dataset. Similar to percentile functions, the `MODE()` function is also used with the `WITHIN GROUP` clause.

**Example:**

```sql
SELECT MODE() WITHIN GROUP (ORDER BY amount) AS mode_amount
FROM sales;
```

This query returns the mode of the `amount` column in the `sales` table.

## **Combining Descriptive Statistics**

You can combine multiple descriptive statistics in a single query to get a comprehensive summary of your data.

**Example:**

```sql
SELECT
    AVG(amount) AS average_sale,
    SUM(amount) AS total_sales,
    MIN(amount) AS smallest_sale,
    MAX(amount) AS largest_sale,
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS median_disc,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_cont,
    MODE() WITHIN GROUP (ORDER BY amount) AS mode_amount
FROM sales;
```

This query provides a complete summary of the `amount` column, including the average, total sum, minimum, maximum, median (both discrete and continuous), and mode.

## **Conclusion**

Descriptive statistics in PostgreSQL can be efficiently performed using built-in SQL functions. These functions help you summarize and understand your data directly within the database. By utilizing functions like `AVG`, `SUM`, `MIN`, `MAX`, `PERCENTILE_DISC`, `PERCENTILE_CONT`, and `MODE`, you can perform a comprehensive statistical analysis of your data sets. Understanding and applying these functions will enhance your data analysis capabilities in PostgreSQL.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://dshub.gitbook.io/ds-hub/sql/sql-descriptive-stats.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
