SQL Tips
1. AGGREGATION
Aggregations occur only vertically. If you perform a calculation across rows, you will simply do basic arithmetic:
# Aggregation
SELECT
SUM(sneakers_sales) AS total_sneakers_sales_2022
FROM sales
WHERE sales_year = 2022
# Arithmetic
SELECT
sneakers_sales + running_shoes_sales AS shoe_sales_by_month
FROM sales
WHERE sales_year = 2022 Note: You would not need to worry as much about the presence of NULLs with SUM as you would with COUNT, as SUM treats nulls as 0.
AVG can only be used with numerical data and ignores NULLs completely.
The difference can be easier to understood in the context of an example:
-- Query 1
SELECT
AVG(low) as avg_low1,
SUM(low)/COUNT(*) as avg_low2 -- Counting all rows
FROM aapl_historical_stock_price;
-- NOTE: avg_low1 is not equal to avg_low2
-- Query 2
SELECT
AVG(low) as avg_low1,
SUM(low)/COUNT(*) as avg_low2 -- Counting all rows
FROM aapl_historical_stock_price
WHERE low IS NOT NULL;
-- NOTE: avg_low1 is equal to avg_low2
-- Query 3
SELECT
AVG(low) as avg_low1,
SUM(low)/COUNT(low) as avg_low2 -- Counting only rows in column 'low'
FROM aapl_historical_stock_price
-- NOTE: avg_low1 is equal to avg_low2
2. CASE
Goes in the
SELECTclause (can also be in GROUP BY)Must include:
WHEN,THEN, andEND.ELSEis optional.Multiple conditional statements may be added between
WHENandTHENusingANDandOR.Multiple
WHENstatements, as well as anELSEstatement can be used to deal with any unaddressed conditions.
Create CASE statements that don't overlap.
A. CASE and AGGREGATION
The following query shows the number of songs breakdown by the a group's type. If the group consists of only one singer it is considered as 'Single', if it is a band then it is 'Band', and if it is a collaborative work of various artists, then it is categorized and 'Collaboration'.
Similar task with Apple's stock prices:
3. PIVOTING
Pivoting is displaying a dataset horizontally that is originally in a vertical structure, or vice-versa.
3.1. VERTICAL to HORIZONTAL (Rows to Columns)
Horizontally re-orienting, i.e. pivoting, the above query :
Another example school names. We will be displaying the number of schools with respect to their first letter in their names:
We can pivot the query as:
3.2. HORIZONTAL TO VERTICAL (Columns to Rows)
Considering the structure of horizontal tables, we might not be able to answer higher level questions like what is the average sales of a certain group of products or the average number of earthquakes for a given time period. Let s put the latter case in an example. Consider we have a table called earthquakes and we would like to find the average number of earthquakes occurred on a year between 2000 and 2012:

We can answer the question as follows given the current structure of the table:
Output:
Though the output is not too bad, it is not very easy to follow either. The better way could be having the data in only 3 columns showing the year, magnitude, and number of earthquakes in separate rows. Though it takes more effort to do such conversion, the outcome is worth it. The horizontal to vertical pivoting in our case will take three steps:
First we need to create a new table that lists all of the columns from the original table, i.e. earthquakes, as rows.
Then cross join it with the original table to create an expanded view. Notice that each row in the earthquakes table is replicated 13 times; the resulting table has 130 rows (13 yr x 10 magnitude categories) and 130rows x 16cols = 2080 records
Finally, we use a
CASEstatement that pulls data from the correct column in the original table given the value in theyearcolumn:
And our table is ready! Below is an excerpt from the query:

Using a VIEW or by storing our new table in database, we can answer the question. above. The following query stores the new table as earthquakes_vertical:
It is time to answer the question:
Output:

The same result, but in a better format!
4. AGGREAGATION vs ARITHMETIC vs PIVOTING
Let's combine what we've seen so far. The following query displays the number of players in each state, based on their year in college, in terms of FR, SO, JR, and SR, in separate columns and another column that sums the the total number of players. Results are ordered by states that have the most players to the least number of players.
5. DISTINCT Values
In real-world scenarios, you will probably need to perform several preliminary queries to be able find out the best way for answering your initial question. Looking at the unique values on each column can help identify how you might want to group or filter the data. The following query counts the number of unique months per year:
6. CASTING: Changing Data Types
Can achieve this with two different type of syntax:
CAST(column_name AS integer)
column_name::integer
which produce the same result.
7. JOINs vs Subquery
8. COALESCE: Dealing with NULL values:
The COALESCE function is used in SQL to return the first non-null value in a list of expressions. This is particularly useful for dealing with scenarios where data may be incomplete, and you want to use a backup value when nulls are present.
Syntax:
Example usage:
In this example, result will be the value of column1 if it's not null, otherwise column2, and if both are null, it will be 'default value'. COALESCE is used especially for numerical datasets and when performing Outer JOINs to replace nulls with 0s due to unmatched rows.
Last updated
Was this helpful?