You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

Unsupervised learning uncovers hidden patterns in without predefined outputs. It's useful when labeled data is scarce or expensive. Common tasks include , which groups similar data points, and , which simplifies complex datasets.

This section covers two clustering methods: and . It also explores (PCA) for dimensionality reduction. These techniques help reveal data structures, visualize , and preprocess information for further analysis.

Unsupervised Learning Goals and Applications

Discovering Hidden Patterns and Structures

Top images from around the web for Discovering Hidden Patterns and Structures
Top images from around the web for Discovering Hidden Patterns and Structures
  • Unsupervised learning aims to discover hidden patterns or structures in unlabeled data without prior knowledge of the desired output
  • Algorithms do not require labeled data, making them suitable for scenarios where obtaining labeled data is expensive, time-consuming, or infeasible (text data, image collections)
  • Can serve as a preprocessing step for supervised learning tasks by providing useful insights into the data or reducing the dimensionality of the input space
  • Helps identify intrinsic properties, similarities, or relationships within the data that may not be apparent or easily observable

Common Unsupervised Learning Tasks

  • Clustering is a common unsupervised learning task that groups similar data points together based on their intrinsic properties or similarities
    • Applications of clustering include customer segmentation (market research), anomaly detection (fraud detection), image segmentation (object recognition), and document clustering (topic modeling)
  • Dimensionality reduction is another unsupervised learning task that aims to reduce the number of features in high-dimensional data while preserving the most important information
    • Applications of dimensionality reduction include (scatter plots), noise reduction (signal processing), (face recognition), and data compression ()

K-means Clustering for Data Grouping

K-means Algorithm Overview

  • K-means clustering is a partitional clustering algorithm that aims to partition n data points into k clusters, where each data point belongs to the cluster with the nearest mean (centroid)
  • The algorithm iteratively assigns data points to the nearest centroid and updates the centroids based on the mean of the assigned data points until convergence
    • Initialization: Randomly select k data points as initial centroids
    • Assignment: Assign each data point to the nearest centroid based on a distance metric (Euclidean distance)
    • Update: Recalculate the centroids by taking the mean of the data points assigned to each cluster
    • Repeat the assignment and update steps until the centroids no longer change significantly or a maximum number of iterations is reached

Considerations and Limitations

  • The choice of the number of clusters (k) is a hyperparameter that needs to be determined beforehand, often based on domain knowledge or using techniques like the elbow method or silhouette analysis
  • K-means clustering is sensitive to the initial placement of centroids and may converge to different local optima depending on the initialization. Multiple runs with different initializations can help find a better solution
  • The algorithm assumes that clusters are spherical, equally sized, and have similar densities, which may not always hold true in real-world datasets (elongated clusters, varying cluster sizes)
  • K-means works well for datasets with well-separated, compact clusters but may struggle with overlapping or non-convex clusters

Hierarchical Clustering vs K-means

Hierarchical Clustering Overview

  • Hierarchical clustering is a clustering approach that builds a hierarchy of clusters by either merging smaller clusters into larger ones (agglomerative) or dividing larger clusters into smaller ones (divisive)
    • Agglomerative clustering starts with each data point as a separate cluster and iteratively merges the closest clusters until a single cluster is formed
    • Divisive clustering starts with all data points in a single cluster and recursively splits the clusters into smaller ones until each data point forms its own cluster
  • The hierarchical structure of the clusters can be visualized using a dendrogram, which shows the merging or splitting of clusters at different levels of similarity

Advantages of Hierarchical Clustering

  • Hierarchical clustering does not require specifying the number of clusters in advance, allowing for more flexibility in exploring different levels of granularity in the data
  • Can handle non-spherical and non-convex clusters, as it does not make assumptions about the shape or size of the clusters
  • The algorithm can use various linkage criteria to determine the similarity between clusters, such as single linkage (nearest neighbor), complete linkage (farthest neighbor), or average linkage (average distance between all pairs of points)
  • Provides a more informative and interpretable representation of the clustering structure compared to flat clustering methods like k-means

Limitations of Hierarchical Clustering

  • Hierarchical clustering has a higher computational complexity compared to k-means, making it less suitable for large datasets (time complexity of O(n^3) for agglomerative clustering)
  • Does not provide a direct way to assign new data points to existing clusters, requiring the re-computation of the entire hierarchy
  • The choice of linkage criteria can have a significant impact on the resulting clustering structure and may require domain knowledge or experimentation to determine the most appropriate one

Dimensionality Reduction with PCA

Principal Component Analysis (PCA) Overview

  • Principal Component Analysis (PCA) is a linear dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving the most important information
  • PCA identifies the principal components, which are orthogonal directions in the feature space that capture the maximum variance in the data
    • The first principal component captures the direction of maximum variance, the second principal component captures the direction of maximum variance orthogonal to the first, and so on
  • The steps to perform PCA include:
    • Standardize the data by subtracting the mean and dividing by the standard deviation of each feature
    • Compute the covariance matrix of the standardized data
    • Perform eigendecomposition on the covariance matrix to obtain the eigenvectors (principal components) and eigenvalues
    • Sort the eigenvectors in descending order based on their corresponding eigenvalues
    • Select the top k eigenvectors to form the projection matrix
    • Transform the original data into the lower-dimensional space by multiplying it with the projection matrix

Considerations and Limitations

  • The number of principal components to retain can be determined based on the cumulative explained variance ratio, which measures the proportion of total variance captured by the selected components
  • PCA assumes that the data is linearly separable and that the principal components are orthogonal to each other. It may not capture non-linear relationships or preserve the original data structure (manifold structure)
  • PCA is sensitive to the scale of the features, so it is important to standardize the data before applying PCA to ensure equal contribution of each feature to the variance
  • Interpreting the principal components can be challenging, as they are linear combinations of the original features and may not have a clear physical or semantic meaning
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary