Data Science Hub
  • Data Science Hub
  • STATISTICS
    • Introduction
    • Fundamentals
      • Data Types
      • Central Tendency, Asymmetry, and Variability
      • Sampling
      • Confidence Interval
      • Hypothesis Testing
    • Distributions
      • Exponential Distribution
    • A/B Testing
      • Sample Size Calculation
      • Multiple Testing
  • Database
    • Database Fundamentals
    • Database Management Systems
    • Data Warehouse vs Data Lake
  • SQL
    • SQL Basics
      • Creating and Modifying Tables/Views
      • Data Types
      • Joins
    • SQL Rules
    • SQL Aggregate Functions
    • SQL Window Functions
    • SQL Data Manipulation
      • String Operations
      • Date/Time Operations
    • SQL Descriptive Stats
    • SQL Tips
    • SQL Performance Tuning
    • SQL Customization
    • SQL Practice
      • Designing Databases
        • Spotify Database Design
      • Most Commonly Asked
      • Mixed Queries
      • Popular Websites For SQL Practice
        • SQLZoo
          • World - BBC Tables
            • SUM and COUNT Tutorial
            • SELECT within SELECT Tutorial
            • SELECT from WORLD Tutorial
            • Select Quiz
            • BBC QUIZ
            • Nested SELECT Quiz
            • SUM and COUNT Quiz
          • Nobel Table
            • SELECT from Nobel Tutorial
            • Nobel Quiz
          • Soccer / Football Tables
            • JOIN Tutorial
            • JOIN Quiz
          • Movie / Actor / Casting Tables
            • More JOIN Operations Tutorial
            • JOIN Quiz 2
          • Teacher - Dept Tables
            • Using Null Quiz
          • Edinburgh Buses Table
            • Self join Quiz
        • HackerRank
          • SQL (Basic)
            • Select All
            • Select By ID
            • Japanese Cities' Attributes
            • Revising the Select Query I
            • Revising the Select Query II
            • Revising Aggregations - The Count Function
            • Revising Aggregations - The Sum Function
            • Revising Aggregations - Averages
            • Average Population
            • Japan Population
            • Population Density Difference
            • Population Census
            • African Cities
            • Average Population of Each Continent
            • Weather Observation Station 1
            • Weather Observation Station 2
            • Weather Observation Station 3
            • Weather Observation Station 4
            • Weather Observation Station 6
            • Weather Observation Station 7
            • Weather Observation Station 8
            • Weather Observation Station 9
            • Weather Observation Station 10
            • Weather Observation Station 11
            • Weather Observation Station 12
            • Weather Observation Station 13
            • Weather Observation Station 14
            • Weather Observation Station 15
            • Weather Observation Station 16
            • Weather Observation Station 17
            • Weather Observation Station 18
            • Weather Observation Station 19
            • Higher Than 75 Marks
            • Employee Names
            • Employee Salaries
            • The Blunder
            • Top Earners
            • Type of Triangle
            • The PADS
          • SQL (Intermediate)
            • Weather Observation Station 5
            • Weather Observation Station 20
            • New Companies
            • The Report
            • Top Competitors
            • Ollivander's Inventory
            • Challenges
            • Contest Leaderboard
            • SQL Project Planning
            • Placements
            • Symmetric Pairs
            • Binary Tree Nodes
            • Interviews
            • Occupations
          • SQL (Advanced)
            • Draw The Triangle 1
            • Draw The Triangle 2
            • Print Prime Numbers
            • 15 Days of Learning SQL
          • TABLES
            • City - Country
            • Station
            • Hackers - Submissions
            • Students
            • Employee - Employees
            • Occupations
            • Triangles
        • StrataScratch
          • Netflix
            • Oscar Nominees Table
            • Nominee Filmography Table
            • Nominee Information Table
          • Audible
            • Easy - Audible
          • Spotify
            • Worldwide Daily Song Ranking Table
            • Billboard Top 100 Year End Table
            • Daily Rankings 2017 US
          • Google
            • Easy - Google
            • Medium - Google
            • Hard - Google
        • LeetCode
          • Easy
  • Python
    • Basics
      • Variables and DataTypes
        • Lists
        • Dictionaries
      • Control Flow
      • Functions
    • Object Oriented Programming
      • Restaurant Modeler
    • Pythonic Resources
    • Projects
  • Machine Learning
    • Fundamentals
      • Supervised Learning
        • Classification Algorithms
          • k-Nearest Neighbors
            • kNN Parameters & Attributes
          • Logistic Regression
        • Classification Report
      • UnSupervised Learning
        • Clustering
          • Evaluation
      • Preprocessing
        • Scalers: Standard vs MinMax
        • Feature Selection vs Dimensionality Reduction
        • Encoding
    • Frameworks
    • Machine Learning in Advertising
    • Natural Language Processing
      • Stopwords
      • Name Entity Recognition (NER)
      • Sentiment Analysis
        • Agoda Reviews - Part I - Scraping Reviews, Detecting Languages, and Preprocessing
        • Agoda Reviews - Part II - Sentiment Analysis and WordClouds
    • Recommendation Systems
      • Spotify Recommender System - Artists
  • Geospatial Analysis
    • Geospatial Analysis Basics
    • GSA at Work
      • Web Scraping and Mapping
  • GIT
    • GIT Essentials
    • Connecting to GitHub
  • FAQ
    • Statistics
  • Cloud Computing
    • Introduction to Cloud Computing
    • Google Cloud Platform
  • Docker
    • What is Docker?
Powered by GitBook
On this page
  • 1. Description
  • 2. Algorithms
  • 2.1. Regression Algorithms:
  • 2.2. Classification Algorithms:
  • Steps

Was this helpful?

  1. Machine Learning
  2. Fundamentals

Supervised Learning

Last updated 1 year ago

Was this helpful?

1. Description

Supervised learning is a type of machine learning in which an algorithm learns a mapping or relationship between input data and corresponding output labels from a labeled training dataset. In other words, it involves training a model to make predictions or classifications based on input features while having access to the correct answers during training. The goal of supervised learning is to learn a mapping function that can generalize to make accurate predictions on new, unseen data.

There are two primary types of supervised learning:

  1. Regression:

    • Regression involves predicting a continuous numerical value or quantity based on input features. In regression, the output is a real number rather than a discrete category. The model learns to approximate the relationship between input variables and a target numerical value.

    • Examples of regression tasks include:

      • Predicting housing prices based on features like square footage, number of bedrooms, and location.

      • Forecasting stock prices based on historical market data.

      • Estimating the age of a person based on demographic information.

    Regression algorithms include linear regression, polynomial regression, support vector regression, and various flavors of regression in machine learning libraries.

  2. Classification:

    • In classification, the goal is to categorize input data into discrete classes or categories. Each data point is associated with a specific class label. The model's objective is to learn the decision boundaries that separate different classes in the feature space.

    • Examples of classification tasks include:

      • Email spam detection (classifying emails as spam or not spam).

      • Image classification (e.g., classifying images of animals into different species).

      • Sentiment analysis (classifying text as positive, negative, or neutral).

    Common algorithms for classification include logistic regression, decision trees, random forests, support vector machines, and deep learning techniques like neural networks.

2. Algorithms

Some major supervised learning algorithms into based on regression and classification algorithms.

2.1. Regression Algorithms:

  1. Linear Regression

    1. Ridge

    2. Lasso

  2. Multiple Linear Regression

  3. Polynomial Regression

2.2. Classification Algorithms:

  1. K-Nearest Neighbors (K-NN) (also can be used for regression)

  2. Logistic Regression

  3. Decision Trees

    • Random Forests (also can be used for regression)

    • Gradient Boosting Machines (GBM) (also can be used for regression)

  4. Support Vector Machines (SVM)

  5. Naive Bayes

    • Gaussian

    • Bernoulli

    • Multinomial

  6. Neural Networks (Deep Learning)

    • Convolutional Neural Networks (CNNs)

    • Recurrent Neural Networks (RNNs)

  7. Linear Discriminant Analysis (LDA)

Steps

The key steps in supervised learning include:

  1. Data Collection: Gathering a labeled dataset consisting of input features and corresponding output labels.

  2. Data Preprocessing: Preparing and cleaning the data, which may involve tasks like feature scaling, handling missing values, and encoding categorical variables.

The scikit-learn requires:

Data:

  1. No missing values

  2. All numeric (1 or 0 instead of Yes/No or True/False), i.e. NO categorical data

  3. Features must be formatted as a 2D array:

    • either Pandas DataFrame or Numpy’s 2d array

  4. Target should be a 1d array:

    • y = data['target'].values

    • y = np.ravel(y) -> to convert to 1d

  1. Model Selection: Choosing an appropriate supervised learning algorithm based on the nature of the problem and the dataset.

  2. Training: Using the labeled training data to train the selected model. During training, the model adjusts its parameters to minimize the prediction error.

  3. Evaluation: Assessing the model's performance using evaluation metrics such as accuracy, precision, recall, mean squared error, or others, depending on the task.

  4. Testing: Testing the trained model on unseen data (the testing dataset) to measure its ability to generalize to new examples.

  5. Deployment: Deploying the trained model in real-world applications to make predictions or classifications.

Supervised learning is widely used in various domains, including natural language processing, computer vision, healthcare, finance, and more, where there is a need to make predictions or decisions based on historical data and labeled examples.

We will be using one of the most popular machine learning packages, called . It is known for its easy to use interface and its range of functions and methods for building and training machine learning models.

Scikit-learn