You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

7.2 Classification and Clustering Algorithms

3 min readjuly 18, 2024

Classification and clustering algorithms are powerful tools in machine learning. These techniques help us make sense of data by organizing it into meaningful groups or categories. From methods like decision trees to unsupervised approaches like , each algorithm has unique strengths.

Evaluating model performance is crucial for effective implementation. Classification metrics like and measure how well models predict labels, while clustering metrics like assess group cohesion. Understanding these evaluation techniques ensures we can choose and fine-tune the best algorithm for each specific task.

Classification Algorithms

Supervised vs unsupervised learning

Top images from around the web for Supervised vs unsupervised learning
Top images from around the web for Supervised vs unsupervised learning
  • Supervised learning utilizes labeled data to train models that map input features to known output labels (classification, regression) to predict correct labels for new, unseen data
  • Unsupervised learning employs unlabeled data to train models that identify patterns or structures without predefined labels (clustering, dimensionality reduction) to discover hidden patterns or groupings

Classification algorithm applications

  • Decision trees construct tree-like models with internal nodes representing features, branches representing decision rules, and leaf nodes representing class labels, recursively partitioning data based on feature values until a stopping criterion is met (ID3, C4.5, CART)
  • , a probabilistic classifier based on Bayes' theorem, assumes features are conditionally independent given the class label and calculates the posterior probability of each class to select the class with the highest probability (Gaussian, Multinomial, Bernoulli)
  • (KNN), an instance-based learning algorithm, classifies new instances based on the majority class of its k nearest neighbors in the feature space using distance metrics (Euclidean, Manhattan, Minkowski), with the choice of k affecting performance and boundary smoothness

Clustering Algorithms

Clustering technique implementation

  • K-means clustering, a partitional algorithm, partitions n observations into k clusters where each observation belongs to the cluster with the nearest mean (centroid), iteratively assigning points and updating centroids until convergence, sensitive to initial positions and may converge to local optima
  • builds a hierarchy of clusters by merging smaller clusters (agglomerative) or dividing larger clusters (divisive) using linkage criteria (single, complete, average, Ward's), producing a dendrogram representing the clustering hierarchy
  • (Density-Based Spatial Clustering of Applications with Noise) identifies clusters as high-density regions separated by low-density regions, defining core points, border points, and noise points based on the number of neighbors within a specified radius (ε), forming clusters by connecting core points and their neighboring core points, robust to noise and can discover arbitrary-shaped clusters

Performance evaluation of models

  • Classification metrics
    • Accuracy: TP+TNTP+TN+FP+FN\frac{TP + TN}{TP + TN + FP + FN}
    • Precision: TPTP+FP\frac{TP}{TP + FP}
    • (Sensitivity): TPTP+FN\frac{TP}{TP + FN}
    • : 2×Precision×RecallPrecision+Recall2 \times \frac{Precision \times Recall}{Precision + Recall}
    • ROC curve and (Area Under the Curve)
  • Clustering metrics
    • Silhouette coefficient measures how well each point fits into its assigned cluster compared to other clusters
    • measures the ratio of within-cluster distances to between-cluster distances
    • measures the ratio of between-cluster dispersion to within-cluster dispersion
    • measures the similarity between clustering results and ground truth labels
    • , , and assess the alignment between clustering results and ground truth labels
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary