Data Science Hub
  • Data Science Hub
  • STATISTICS
    • Introduction
    • Fundamentals
      • Data Types
      • Central Tendency, Asymmetry, and Variability
      • Sampling
      • Confidence Interval
      • Hypothesis Testing
    • Distributions
      • Exponential Distribution
    • A/B Testing
      • Sample Size Calculation
      • Multiple Testing
  • Database
    • Database Fundamentals
    • Database Management Systems
    • Data Warehouse vs Data Lake
  • SQL
    • SQL Basics
      • Creating and Modifying Tables/Views
      • Data Types
      • Joins
    • SQL Rules
    • SQL Aggregate Functions
    • SQL Window Functions
    • SQL Data Manipulation
      • String Operations
      • Date/Time Operations
    • SQL Descriptive Stats
    • SQL Tips
    • SQL Performance Tuning
    • SQL Customization
    • SQL Practice
      • Designing Databases
        • Spotify Database Design
      • Most Commonly Asked
      • Mixed Queries
      • Popular Websites For SQL Practice
        • SQLZoo
          • World - BBC Tables
            • SUM and COUNT Tutorial
            • SELECT within SELECT Tutorial
            • SELECT from WORLD Tutorial
            • Select Quiz
            • BBC QUIZ
            • Nested SELECT Quiz
            • SUM and COUNT Quiz
          • Nobel Table
            • SELECT from Nobel Tutorial
            • Nobel Quiz
          • Soccer / Football Tables
            • JOIN Tutorial
            • JOIN Quiz
          • Movie / Actor / Casting Tables
            • More JOIN Operations Tutorial
            • JOIN Quiz 2
          • Teacher - Dept Tables
            • Using Null Quiz
          • Edinburgh Buses Table
            • Self join Quiz
        • HackerRank
          • SQL (Basic)
            • Select All
            • Select By ID
            • Japanese Cities' Attributes
            • Revising the Select Query I
            • Revising the Select Query II
            • Revising Aggregations - The Count Function
            • Revising Aggregations - The Sum Function
            • Revising Aggregations - Averages
            • Average Population
            • Japan Population
            • Population Density Difference
            • Population Census
            • African Cities
            • Average Population of Each Continent
            • Weather Observation Station 1
            • Weather Observation Station 2
            • Weather Observation Station 3
            • Weather Observation Station 4
            • Weather Observation Station 6
            • Weather Observation Station 7
            • Weather Observation Station 8
            • Weather Observation Station 9
            • Weather Observation Station 10
            • Weather Observation Station 11
            • Weather Observation Station 12
            • Weather Observation Station 13
            • Weather Observation Station 14
            • Weather Observation Station 15
            • Weather Observation Station 16
            • Weather Observation Station 17
            • Weather Observation Station 18
            • Weather Observation Station 19
            • Higher Than 75 Marks
            • Employee Names
            • Employee Salaries
            • The Blunder
            • Top Earners
            • Type of Triangle
            • The PADS
          • SQL (Intermediate)
            • Weather Observation Station 5
            • Weather Observation Station 20
            • New Companies
            • The Report
            • Top Competitors
            • Ollivander's Inventory
            • Challenges
            • Contest Leaderboard
            • SQL Project Planning
            • Placements
            • Symmetric Pairs
            • Binary Tree Nodes
            • Interviews
            • Occupations
          • SQL (Advanced)
            • Draw The Triangle 1
            • Draw The Triangle 2
            • Print Prime Numbers
            • 15 Days of Learning SQL
          • TABLES
            • City - Country
            • Station
            • Hackers - Submissions
            • Students
            • Employee - Employees
            • Occupations
            • Triangles
        • StrataScratch
          • Netflix
            • Oscar Nominees Table
            • Nominee Filmography Table
            • Nominee Information Table
          • Audible
            • Easy - Audible
          • Spotify
            • Worldwide Daily Song Ranking Table
            • Billboard Top 100 Year End Table
            • Daily Rankings 2017 US
          • Google
            • Easy - Google
            • Medium - Google
            • Hard - Google
        • LeetCode
          • Easy
  • Python
    • Basics
      • Variables and DataTypes
        • Lists
        • Dictionaries
      • Control Flow
      • Functions
    • Object Oriented Programming
      • Restaurant Modeler
    • Pythonic Resources
    • Projects
  • Machine Learning
    • Fundamentals
      • Supervised Learning
        • Classification Algorithms
          • k-Nearest Neighbors
            • kNN Parameters & Attributes
          • Logistic Regression
        • Classification Report
      • UnSupervised Learning
        • Clustering
          • Evaluation
      • Preprocessing
        • Scalers: Standard vs MinMax
        • Feature Selection vs Dimensionality Reduction
        • Encoding
    • Frameworks
    • Machine Learning in Advertising
    • Natural Language Processing
      • Stopwords
      • Name Entity Recognition (NER)
      • Sentiment Analysis
        • Agoda Reviews - Part I - Scraping Reviews, Detecting Languages, and Preprocessing
        • Agoda Reviews - Part II - Sentiment Analysis and WordClouds
    • Recommendation Systems
      • Spotify Recommender System - Artists
  • Geospatial Analysis
    • Geospatial Analysis Basics
    • GSA at Work
      • Web Scraping and Mapping
  • GIT
    • GIT Essentials
    • Connecting to GitHub
  • FAQ
    • Statistics
  • Cloud Computing
    • Introduction to Cloud Computing
    • Google Cloud Platform
  • Docker
    • What is Docker?
Powered by GitBook
On this page

Was this helpful?

  1. Machine Learning
  2. Fundamentals
  3. Supervised Learning

Classification Report

Last updated 1 year ago

Was this helpful?

A classification report provides several important metrics for evaluating the performance of a classification model. The exact methods and functions for generating classification reports may vary slightly among machine libraries/frameworks, e.g scikit-learn offers it with sklearn.metrics.classification_report module:

However, the concept and purpose remain the same: to evaluate the performance of classification models and provide metrics like precision, recall, F1-score, and support:

  1. Precision (Positive Predictive Value - PPV): Precision is the ratio of correctly predicted positive instances to the total instances predicted as positive. In other words, it measures the accuracy of the positive predictions made by the model. A higher precision indicates fewer false positives.

    Precision = TP / (TP + FP)

    Where:

    • TP (True Positives) is the number of instances correctly predicted as positive.

    • FP (False Positives) is the number of instances incorrectly predicted as positive (i.e., negative instances that were predicted as positive).

    • Example: Number of correctly labeled SPAM emails / the total number of emails classified as SPAM

      • High precision means that classifier had a low false positive rate, that is, not many real emails were predicted as spam.

  2. Recall (Sensitivity, Hit Rate, or True Positive Rate): Recall is the ratio of correctly predicted positive instances to the total actual positive instances in the dataset. It measures the model's ability to find all positive instances. A higher recall indicates fewer false negatives.

    Recall = TP / (TP + FN)

    Where:

    • TP (True Positives) is the number of instances correctly predicted as positive.

    • FN (False Negatives) is the number of instances incorrectly predicted as negative (i.e., positive instances that were predicted as negative).

    • Example: Number of correctly labeled SPAM emails / the total number of SPAM emails

      • High recall means that classifier predicted most positive or spam emails correctly.

  3. F1-Score: The F1-score is the harmonic mean of precision and recall. It provides a balance between precision and recall. It is particularly useful when you want to find a balance between minimizing false positives and false negatives.

    F1-Score = 2 * (Precision * Recall) / (Precision + Recall)

  4. Support: Support is the number of instances in each class. It represents the actual number of instances that belong to each class in the dataset.

  5. Accuracy: Accuracy is a straightforward metric that measures the overall correctness of the model's predictions across all classes. It's the ratio of correctly classified instances to the total number of instances.

    Accuracy = (TP + TN) / (TP + TN + FP + FN)

  6. Macro Average (Macro Avg): Macro average calculates the unweighted mean of the class-specific metrics (e.g., precision, recall, F1-score) for each class. Each class contributes equally to the macro average, regardless of its size or frequency, e.g.

    Macro Avg Precision = (Precision_class_1 + Precision_class_2 + ... + Precision_class_n) / n

  7. Weighted Average (Weighted Avg): Weighted average calculates the mean of the class-specific metrics, but it takes into account the number of instances in each class. Classes with more instances have a greater impact on the weighted average.

    Weighted Avg Precision = (Precision_class_1 * Support_class_1 + Precision_class_2 * Support_class_2 + ... + Precision_class_n * Support_class_n) / (Support_class_1 + Support_class_2 + ... + Support_class_n)

In summary:

  • Precision measures the accuracy of positive predictions.

  • Recall measures the model's ability to find all positive instances.

  • F1-Score balances precision and recall.

  • Support indicates the number of instances in each class.

  • Accuracy measures the overall correctness of predictions.

  • Macro Avg calculates unweighted averages of class-specific metrics and treats all classes equally.

  • Weighted Avg calculates class-specific metrics with a weighted average based on the number of instances in each class.

The classification report is a valuable tool for understanding the performance of a classification model, especially in cases where the class distribution is imbalanced or when different trade-offs between precision and recall are required.

A confusion matrix is a table that shows the count combinations of every predicted and actual class, i.e. TP, TN, FP, and FN.

Different trade-offs between precision and recall

Trade-offs between precision and recall are common in classification tasks, and the choice between them depends on the specific goals and requirements of your application. Here are some examples of different trade-offs between precision and recall:

  1. High Precision, Low Recall (Conservative Approach):

    • Example: Spam Email Filter

    • Goal: Minimize false positives (genuine emails classified as spam).

    • Trade-off: Some spam emails may still end up in the inbox (lower recall), but users won't miss important emails (higher precision).

  2. High Recall, Low Precision (Liberal Approach):

    • Example: Cancer Detection

    • Goal: Detect as many true positives (cancer cases) as possible.

    • Trade-off: More false positives (healthy individuals classified as having cancer) may occur, leading to unnecessary medical tests or treatments (lower precision).

  3. Balanced Precision and Recall:

    • Example: Fraud Detection

    • Goal: Accurately detect fraudulent transactions while keeping false alarms to a minimum.

    • Trade-off: Striking a balance between missing some fraud cases (lower recall) and minimizing false alarms (higher precision).

  4. Precision-Recall Trade-off in Thresholding:

    • Example: Sentiment Analysis in Social Media

    • Goal: Identify positive sentiment in user comments.

    • Trade-off: By adjusting the classification threshold, you can increase precision by being more conservative (e.g., only classify strongly positive comments as positive) or increase recall by being more liberal (e.g., classify most comments as positive, including some that are only mildly positive).

  5. Medical Diagnostics with Different Thresholds:

    • Example: Medical tests for diseases

    • Goal: Use different thresholds for test results to balance precision and recall. For example, a higher threshold might be used for initial screening to ensure high precision, and a lower threshold for confirmatory tests to improve recall.

  6. Anomaly Detection with Varying Sensitivity:

    • Example: Network Intrusion Detection

    • Goal: Adjust the sensitivity of intrusion detection systems to identify unusual network behavior.

    • Trade-off: By changing detection thresholds, you can balance between catching more anomalies (higher recall) and reducing false alarms (higher precision).

  7. Information Retrieval in Search Engines:

    • Example: Search engine ranking and retrieval

    • Goal: Provide relevant search results to users.

    • Trade-off: Search engines often provide a mix of results with different precision and recall levels, with highly relevant results at the top (higher precision) and more results further down the list (higher recall).

These examples illustrate that the choice between precision and recall depends on the specific context and goals of the classification task. It's essential to understand the trade-offs and select the appropriate balance to meet the requirements of your application.

Credit:
Jerry An