Data Science Hub
  • Data Science Hub
  • STATISTICS
    • Introduction
    • Fundamentals
      • Data Types
      • Central Tendency, Asymmetry, and Variability
      • Sampling
      • Confidence Interval
      • Hypothesis Testing
    • Distributions
      • Exponential Distribution
    • A/B Testing
      • Sample Size Calculation
      • Multiple Testing
  • Database
    • Database Fundamentals
    • Database Management Systems
    • Data Warehouse vs Data Lake
  • SQL
    • SQL Basics
      • Creating and Modifying Tables/Views
      • Data Types
      • Joins
    • SQL Rules
    • SQL Aggregate Functions
    • SQL Window Functions
    • SQL Data Manipulation
      • String Operations
      • Date/Time Operations
    • SQL Descriptive Stats
    • SQL Tips
    • SQL Performance Tuning
    • SQL Customization
    • SQL Practice
      • Designing Databases
        • Spotify Database Design
      • Most Commonly Asked
      • Mixed Queries
      • Popular Websites For SQL Practice
        • SQLZoo
          • World - BBC Tables
            • SUM and COUNT Tutorial
            • SELECT within SELECT Tutorial
            • SELECT from WORLD Tutorial
            • Select Quiz
            • BBC QUIZ
            • Nested SELECT Quiz
            • SUM and COUNT Quiz
          • Nobel Table
            • SELECT from Nobel Tutorial
            • Nobel Quiz
          • Soccer / Football Tables
            • JOIN Tutorial
            • JOIN Quiz
          • Movie / Actor / Casting Tables
            • More JOIN Operations Tutorial
            • JOIN Quiz 2
          • Teacher - Dept Tables
            • Using Null Quiz
          • Edinburgh Buses Table
            • Self join Quiz
        • HackerRank
          • SQL (Basic)
            • Select All
            • Select By ID
            • Japanese Cities' Attributes
            • Revising the Select Query I
            • Revising the Select Query II
            • Revising Aggregations - The Count Function
            • Revising Aggregations - The Sum Function
            • Revising Aggregations - Averages
            • Average Population
            • Japan Population
            • Population Density Difference
            • Population Census
            • African Cities
            • Average Population of Each Continent
            • Weather Observation Station 1
            • Weather Observation Station 2
            • Weather Observation Station 3
            • Weather Observation Station 4
            • Weather Observation Station 6
            • Weather Observation Station 7
            • Weather Observation Station 8
            • Weather Observation Station 9
            • Weather Observation Station 10
            • Weather Observation Station 11
            • Weather Observation Station 12
            • Weather Observation Station 13
            • Weather Observation Station 14
            • Weather Observation Station 15
            • Weather Observation Station 16
            • Weather Observation Station 17
            • Weather Observation Station 18
            • Weather Observation Station 19
            • Higher Than 75 Marks
            • Employee Names
            • Employee Salaries
            • The Blunder
            • Top Earners
            • Type of Triangle
            • The PADS
          • SQL (Intermediate)
            • Weather Observation Station 5
            • Weather Observation Station 20
            • New Companies
            • The Report
            • Top Competitors
            • Ollivander's Inventory
            • Challenges
            • Contest Leaderboard
            • SQL Project Planning
            • Placements
            • Symmetric Pairs
            • Binary Tree Nodes
            • Interviews
            • Occupations
          • SQL (Advanced)
            • Draw The Triangle 1
            • Draw The Triangle 2
            • Print Prime Numbers
            • 15 Days of Learning SQL
          • TABLES
            • City - Country
            • Station
            • Hackers - Submissions
            • Students
            • Employee - Employees
            • Occupations
            • Triangles
        • StrataScratch
          • Netflix
            • Oscar Nominees Table
            • Nominee Filmography Table
            • Nominee Information Table
          • Audible
            • Easy - Audible
          • Spotify
            • Worldwide Daily Song Ranking Table
            • Billboard Top 100 Year End Table
            • Daily Rankings 2017 US
          • Google
            • Easy - Google
            • Medium - Google
            • Hard - Google
        • LeetCode
          • Easy
  • Python
    • Basics
      • Variables and DataTypes
        • Lists
        • Dictionaries
      • Control Flow
      • Functions
    • Object Oriented Programming
      • Restaurant Modeler
    • Pythonic Resources
    • Projects
  • Machine Learning
    • Fundamentals
      • Supervised Learning
        • Classification Algorithms
          • k-Nearest Neighbors
            • kNN Parameters & Attributes
          • Logistic Regression
        • Classification Report
      • UnSupervised Learning
        • Clustering
          • Evaluation
      • Preprocessing
        • Scalers: Standard vs MinMax
        • Feature Selection vs Dimensionality Reduction
        • Encoding
    • Frameworks
    • Machine Learning in Advertising
    • Natural Language Processing
      • Stopwords
      • Name Entity Recognition (NER)
      • Sentiment Analysis
        • Agoda Reviews - Part I - Scraping Reviews, Detecting Languages, and Preprocessing
        • Agoda Reviews - Part II - Sentiment Analysis and WordClouds
    • Recommendation Systems
      • Spotify Recommender System - Artists
  • Geospatial Analysis
    • Geospatial Analysis Basics
    • GSA at Work
      • Web Scraping and Mapping
  • GIT
    • GIT Essentials
    • Connecting to GitHub
  • FAQ
    • Statistics
  • Cloud Computing
    • Introduction to Cloud Computing
    • Google Cloud Platform
  • Docker
    • What is Docker?
Powered by GitBook
On this page

Was this helpful?

  1. SQL

SQL Performance Tuning

There are two goals Performance Tuning aims to achieve:

  • to reduce the amount of time it takes a query to show results

  • to reduce the amount of resources used to process the query

Note that a database is no different that a regular software we run on our computers and therefore is subjected to limitations of our hardware.

In order to run a query faster, we need to reduce the number of calculations we perform. To do that, we should know what affects the number of calculations and subsequently the querys' runtime.

Parameters that are in your control and affect the calculations are as follows:

  • Table Size: Running queries on tables with hundreds (or thousands) of rows take shorter time than that of millions of rows.

  • Joins: Our queries will likely to be slower, if we perform joins on two tables in a way that the resulting table has substantially increased row count.

  • Aggregations: Aggregating multiple rows to derive a result demands more computational effort than merely fetching those individual rows.

Parameters that are not in your control and yet affect the calculations are as follows:

  • Queries by Other Users: When multiple users are executing queries concurrently on a database, the database needs to handle a higher processing load, leading to slower overall performance. This becomes particularly problematic if some users are running resource-intensive queries that meet the criteria mentioned above.

  • Database Software: Having a good understanding of the system you are working with allows you to optimize your queries within its limitations for improved efficiency.

For better perfomance:

  • Reduce tables size: Perform exploratory data analysis on a subset of data. For instance, if we have a time series data, we can work with a smaller time window first. We can also LIMIT the row count to certain number during exploration phase. Once we complete the final version of our query, we can run it on the entire dataset.

Keep in mind that if you are performing aggregation, i.e. COUNT(*), and use LIMIT to increase the speed, it wouldn't work. In order to speed our query up, we need to limit the dataset in a subquery before applying the COUNT(*) .

SELECT COUNT(*)
FROM 
(
  SELECT *
  FROM employees
  LIMIT 100
) AS tbl

is faster than

SELECT COUNT(*)
FROM employees
LIMIT 100

Applying the LIMIT function as in the former query, i.e. in the subquery, is a very common approach for testing query logic. However, it will result in significant changes to your results. As a best practice, use it primarily for testing rather than obtaining actual results.

  • Perform Simpler Joins: If we were to perform an aggregation after joining two table, we should try to do it before the JOIN occurs if possible.

Let’s check the runtime difference between the two following queries that count the number Walmart employees in the US for each department per region (hypothetically). Keep in mind that Walmart has over 1.5M associates in the US. This means that the first query needs to go through 1.5M rows to check matches. In the second query, this number drops down to hundreds (if not tens, i.e. number of departments) and the perform the JOIN, which means latter runs in a much faster manner.

SELECT 
   d.region,
   e.dept_name,
   COUNT(*) AS employee_counts
FROM walmart_employees_us e 
JOIN walmart_departments d
ON e.dept_name = d.dept_name
GROUP BY 1,2

Instead aggreate before the JOIN:

SELECT 
   d.region,
   t.*    
FROM 
   (SELECT
      dept_name,
      COUNT(*) AS employee_counts
   FROM walmart_employees_us
   GROUP BY 1) AS t
JOIN walmart_departments d
ON t.dept_name = d.dept_name

EXPLAIN

The EXPLAIN command displays execution plan generated by the PostgreSQL planner for the given statement. We simply add the command in the begining of the query shown below:

The output is called the Query Plan, and though the results are not perfectly accurate, it's still a useful tool and shows the order, bottom to top, in which the query is being executed. The costs listed in the steps indicator of the runtimes and higher the number, the longer runtime.

We get the most benefit from the EXPLAIN when we run a query, pinpoint expensive steps, modify them and re-run EXPLAIN to if we have reduced the cost. Once again, EXPLAIN becomes especially helpful when run queries on large tables (with millions of rows or more), perform complex JOINs or Aggregations.

Last updated 1 year ago

Was this helpful?

Check out for more details on Explain.

Postgres Documentation