Data Science Hub
  • Data Science Hub
  • STATISTICS
    • Introduction
    • Fundamentals
      • Data Types
      • Central Tendency, Asymmetry, and Variability
      • Sampling
      • Confidence Interval
      • Hypothesis Testing
    • Distributions
      • Exponential Distribution
    • A/B Testing
      • Sample Size Calculation
      • Multiple Testing
  • Database
    • Database Fundamentals
    • Database Management Systems
    • Data Warehouse vs Data Lake
  • SQL
    • SQL Basics
      • Creating and Modifying Tables/Views
      • Data Types
      • Joins
    • SQL Rules
    • SQL Aggregate Functions
    • SQL Window Functions
    • SQL Data Manipulation
      • String Operations
      • Date/Time Operations
    • SQL Descriptive Stats
    • SQL Tips
    • SQL Performance Tuning
    • SQL Customization
    • SQL Practice
      • Designing Databases
        • Spotify Database Design
      • Most Commonly Asked
      • Mixed Queries
      • Popular Websites For SQL Practice
        • SQLZoo
          • World - BBC Tables
            • SUM and COUNT Tutorial
            • SELECT within SELECT Tutorial
            • SELECT from WORLD Tutorial
            • Select Quiz
            • BBC QUIZ
            • Nested SELECT Quiz
            • SUM and COUNT Quiz
          • Nobel Table
            • SELECT from Nobel Tutorial
            • Nobel Quiz
          • Soccer / Football Tables
            • JOIN Tutorial
            • JOIN Quiz
          • Movie / Actor / Casting Tables
            • More JOIN Operations Tutorial
            • JOIN Quiz 2
          • Teacher - Dept Tables
            • Using Null Quiz
          • Edinburgh Buses Table
            • Self join Quiz
        • HackerRank
          • SQL (Basic)
            • Select All
            • Select By ID
            • Japanese Cities' Attributes
            • Revising the Select Query I
            • Revising the Select Query II
            • Revising Aggregations - The Count Function
            • Revising Aggregations - The Sum Function
            • Revising Aggregations - Averages
            • Average Population
            • Japan Population
            • Population Density Difference
            • Population Census
            • African Cities
            • Average Population of Each Continent
            • Weather Observation Station 1
            • Weather Observation Station 2
            • Weather Observation Station 3
            • Weather Observation Station 4
            • Weather Observation Station 6
            • Weather Observation Station 7
            • Weather Observation Station 8
            • Weather Observation Station 9
            • Weather Observation Station 10
            • Weather Observation Station 11
            • Weather Observation Station 12
            • Weather Observation Station 13
            • Weather Observation Station 14
            • Weather Observation Station 15
            • Weather Observation Station 16
            • Weather Observation Station 17
            • Weather Observation Station 18
            • Weather Observation Station 19
            • Higher Than 75 Marks
            • Employee Names
            • Employee Salaries
            • The Blunder
            • Top Earners
            • Type of Triangle
            • The PADS
          • SQL (Intermediate)
            • Weather Observation Station 5
            • Weather Observation Station 20
            • New Companies
            • The Report
            • Top Competitors
            • Ollivander's Inventory
            • Challenges
            • Contest Leaderboard
            • SQL Project Planning
            • Placements
            • Symmetric Pairs
            • Binary Tree Nodes
            • Interviews
            • Occupations
          • SQL (Advanced)
            • Draw The Triangle 1
            • Draw The Triangle 2
            • Print Prime Numbers
            • 15 Days of Learning SQL
          • TABLES
            • City - Country
            • Station
            • Hackers - Submissions
            • Students
            • Employee - Employees
            • Occupations
            • Triangles
        • StrataScratch
          • Netflix
            • Oscar Nominees Table
            • Nominee Filmography Table
            • Nominee Information Table
          • Audible
            • Easy - Audible
          • Spotify
            • Worldwide Daily Song Ranking Table
            • Billboard Top 100 Year End Table
            • Daily Rankings 2017 US
          • Google
            • Easy - Google
            • Medium - Google
            • Hard - Google
        • LeetCode
          • Easy
  • Python
    • Basics
      • Variables and DataTypes
        • Lists
        • Dictionaries
      • Control Flow
      • Functions
    • Object Oriented Programming
      • Restaurant Modeler
    • Pythonic Resources
    • Projects
  • Machine Learning
    • Fundamentals
      • Supervised Learning
        • Classification Algorithms
          • k-Nearest Neighbors
            • kNN Parameters & Attributes
          • Logistic Regression
        • Classification Report
      • UnSupervised Learning
        • Clustering
          • Evaluation
      • Preprocessing
        • Scalers: Standard vs MinMax
        • Feature Selection vs Dimensionality Reduction
        • Encoding
    • Frameworks
    • Machine Learning in Advertising
    • Natural Language Processing
      • Stopwords
      • Name Entity Recognition (NER)
      • Sentiment Analysis
        • Agoda Reviews - Part I - Scraping Reviews, Detecting Languages, and Preprocessing
        • Agoda Reviews - Part II - Sentiment Analysis and WordClouds
    • Recommendation Systems
      • Spotify Recommender System - Artists
  • Geospatial Analysis
    • Geospatial Analysis Basics
    • GSA at Work
      • Web Scraping and Mapping
  • GIT
    • GIT Essentials
    • Connecting to GitHub
  • FAQ
    • Statistics
  • Cloud Computing
    • Introduction to Cloud Computing
    • Google Cloud Platform
  • Docker
    • What is Docker?
Powered by GitBook
On this page

Was this helpful?

  1. Machine Learning
  2. Fundamentals

UnSupervised Learning

Unsupervised learning is a type of machine learning where the algorithm is trained on a dataset without explicit supervision, meaning that there are no labeled output/target variables to guide the learning process. Instead, the algorithm tries to find patterns, structures, or relationships within the data on its own. Unsupervised learning is particularly useful for tasks where the goal is to discover hidden patterns or groupings in data, reduce dimensionality, or perform data compression.

There are two main areas unsupervised learning is used extensively:

  1. Clustering:

Clustering algorithms aim to group similar data points together into clusters or categories. The algorithm identifies inherent structures in the data based on similarities or dissimilarities between data points.

  • Common clustering algorithms include:

    • K-Means: Assigns data points to K clusters based on the mean value of their features.

    • Hierarchical Clustering: Builds a hierarchy of clusters by recursively merging or splitting clusters.

    • DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Clusters data points based on their density and proximity.

    • Gaussian Mixture Models (GMM): Models data as a mixture of Gaussian distributions.

Applications of clustering include customer segmentation, image segmentation, and anomaly detection.

  1. Dimensionality Reduction:

Dimensionality reduction techniques aim to reduce the number of input features while preserving important information. This is often done to simplify data, remove noise, or improve computational efficiency.

  • Common dimensionality reduction methods include:

    • Principal Component Analysis (PCA): Linear dimensionality reduction technique that identifies orthogonal axes (principal components) that capture the most variance in the data.

    • t-Distributed Stochastic Neighbor Embedding (t-SNE): Non-linear dimensionality reduction method that focuses on preserving pairwise similarities between data points in low-dimensional space.

    • Autoencoders: Neural network-based techniques for learning compact representations of data.

Dimensionality reduction is used in various applications, including data visualization, feature engineering, and speeding up machine learning algorithms.

  1. Association Rule Mining

Association Rule Mining is used to identify relationships or patterns in data without any predefined labels or target variables. It involves discovering associations between items in large datasets, typically without the need for labeled outputs. For example, it’s often used in market basket analysis to find relationships between products purchased together.

  • Common algorithms for association rule mining include:

    • Apriori Algorithm: A classic algorithm that finds frequent itemsets by iteratively scanning the dataset and pruning non-frequent itemsets, generating association rules based on support and confidence thresholds.

    • Eclat Algorithm: An efficient algorithm that uses a vertical data format and depth-first search to find frequent itemsets by performing intersection operations on transaction lists.

    • Frequent Pattern Growth (FP-Growth) Algorithm: A fast and memory-efficient algorithm (than both Apriori and Eclat) that avoids candidate generation by building a compressed tree structure (FP-tree) to mine frequent itemsets and generate association rules.

Unsupervised learning is particularly valuable in scenarios where the data lacks clear labels or where the goal is to explore and discover underlying structures. Some common use cases for unsupervised learning include:

  • Market segmentation: Identifying distinct customer groups based on purchasing behavior.

  • Image segmentation and compression: Detecting objects in an image and reducing the storage space required for images while preserving their quality.

  • Anomaly detection: Detecting unusual patterns or outliers in data, which could indicate fraud or errors.

  • Natural Language Processing: Extracting latent topics from a collection of documents.

  • Recommendation Systems: Clustering news articles into topics for recommendation or categorization.

  • Dimensionality Reduction: Reducing the dimensionality of data to improve the efficiency and interpretability of machine learning models.

Unsupervised learning is an essential component of the machine learning toolbox and complements supervised learning, where labeled data is used for tasks like classification and regression.

Last updated 6 months ago

Was this helpful?