You have 3 free guides left 😟

Light

You have 3 free guides left 😟

8.2 Support Vector Machines (SVM)

6 min read•august 16, 2024

Support Vector Machines (SVMs) are powerful classification tools that find the optimal separating classes in feature space. They maximize the margin between classes, reducing overfitting and improving generalization. SVMs work well for both linear and non-linear data.

SVMs use to handle non-linear data, mapping it to higher dimensions for easier separation. Common kernels include linear, polynomial, and (RBF). SVMs can be evaluated using metrics like , , , and ROC curves, with to prevent overfitting.

SVM Fundamentals

Core Concepts and Margin Maximization

Top images from around the web for Core Concepts and Margin Maximization

Support vector machines — Courses View original
Is this image relevant?
Support vector machine - Wikipedia View original
Is this image relevant?
Support vector machines – Wikipedie View original
Is this image relevant?
Support vector machines — Courses View original
Is this image relevant?
Support vector machine - Wikipedia View original
Is this image relevant?

1 of 3

Top images from around the web for Core Concepts and Margin Maximization

Support vector machines — Courses View original
Is this image relevant?
Support vector machine - Wikipedia View original
Is this image relevant?
Support vector machines – Wikipedie View original
Is this image relevant?
Support vector machines — Courses View original
Is this image relevant?
Support vector machine - Wikipedia View original
Is this image relevant?

1 of 3

Support Vector Machines (SVMs) function as supervised learning models for classification and regression tasks
- Primary focus involves finding the optimal hyperplane separating classes in feature space
Margin in SVM represents the distance between decision boundary and nearest data points from each class ()
SVMs maximize the margin between classes
- Achieves better generalization
- Reduces overfitting
Mathematical formulation of SVM solves a constrained optimization problem to find optimal hyperplane parameters
SVM introduces slack variables for non-linearly separable data
- Allows some misclassifications while still maximizing margin
- Useful for real-world datasets with noise or outliers

Mathematical Foundations and Optimization

SVM optimization problem formulated as: $\min_{w,b} \frac{1}{2} ||w||^2 + C \sum_{i=1}^{n} \xi_i$ subject to $y_i(w^T x_i + b) \geq 1 - \xi_i$ and $\xi_i \geq 0$ for all $i$
Dual formulation of SVM utilizes Lagrange multipliers
- Enables efficient solving of optimization problem in high-dimensional spaces
allows implicit mapping to higher-dimensional spaces without explicit computation
- Facilitates non-linear classification
Optimization techniques for SVM include:
- Quadratic programming solvers
- Sequential Minimal Optimization (SMO) algorithm
- Gradient descent and its variants (stochastic gradient descent, mini-batch gradient descent)

SVM Implementation

Linear SVM Implementation

finds optimal hyperplane parameters (w and b) maximizing margin between classes in linearly separable data
Implementation steps for linear SVM:
1. Preprocess and normalize input data
2. Initialize model parameters (w and b)
3. Define hinge loss function
4. Implement optimization algorithm (gradient descent)
5. Update parameters iteratively to minimize loss
6. Apply learned model to make predictions

Example linear SVM implementation in Python using scikit-learn:

from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler

# Preprocess data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Create and train linear SVM model
svm_model = SVC(kernel='linear', C=1.0)
svm_model.fit(X_scaled, y)

# Make predictions
predictions = svm_model.predict(X_test_scaled)

Non-linear SVM and Multi-class Classification

implementation uses kernel functions to transform input space into higher-dimensional feature space
- Enables linear separation in transformed space
Kernel functions (Radial Basis Function, Polynomial) map data to higher dimensions implicitly
Multi-class SVM classification techniques:
- One-vs-One approach constructs binary classifiers for each pair of classes
- One-vs-All approach trains one classifier per class against all others
Implementing SVM requires careful consideration of hyperparameters:
- Regularization parameter C controls trade-off between margin maximization and misclassification penalty
- Kernel-specific parameters (gamma for RBF kernel, degree for ) affect decision boundary shape

Example non-linear SVM implementation with RBF kernel:

from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

# Preprocess data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Define parameter grid for hyperparameter tuning
param_grid = {'C': [0.1, 1, 10], 'gamma': [0.1, 1, 10]}

# Create and train non-linear SVM model with grid search
svm_model = GridSearchCV(SVC(kernel='rbf'), param_grid, cv=5)
svm_model.fit(X_scaled, y)

# Make predictions using best model
predictions = svm_model.predict(X_test_scaled)

SVM Evaluation

Performance Metrics and Validation Techniques

Cross-validation techniques (k-fold cross-validation) evaluate SVM model performance and prevent overfitting
- Splits data into k subsets, trains on k-1 subsets, and validates on the remaining subset
- Repeats process k times, averaging results for robust performance estimate
Common evaluation metrics for SVM classification:
- Accuracy measures overall correct predictions
- Precision quantifies true positive rate among positive predictions
- Recall (sensitivity) measures true positive rate among actual positive instances
- F1-score provides harmonic mean of precision and recall
Receiver Operating Characteristic (ROC) curve plots true positive rate against false positive rate
- Area Under the Curve (AUC) summarizes model performance across different classification thresholds
Confusion matrices offer detailed insights into classification errors
- Rows represent actual classes, columns represent predicted classes
- Diagonal elements show correct classifications, off-diagonal elements show misclassifications

Model Diagnostics and Optimization

Learning curves diagnose bias and variance issues in SVM models
- Plot training and validation performance against number of training examples
- High bias indicated by poor performance on both training and validation sets
- High variance shown by large gap between training and validation performance
Model selection techniques find optimal hyperparameter configurations:
- Grid search exhaustively searches predefined parameter ranges
- Random search samples parameter combinations randomly, often more efficient for high-dimensional spaces

Example of SVM evaluation and hyperparameter tuning:

from sklearn.model_selection import cross_val_score, learning_curve
from sklearn.metrics import confusion_matrix, roc_auc_score
import numpy as np
import matplotlib.pyplot as plt

# Perform cross-validation
cv_scores = cross_val_score(svm_model, X_scaled, y, cv=5)
print(f"Cross-validation scores: {cv_scores}")
print(f"Mean CV score: {np.mean(cv_scores)}")

# Generate [confusion matrix](https://www.fiveableKeyTerm:confusion_matrix)
cm = confusion_matrix(y_test, predictions)
print("Confusion Matrix:")
print(cm)

# Calculate ROC AUC score
roc_auc = roc_auc_score(y_test, svm_model.decision_function(X_test_scaled))
print(f"ROC AUC Score: {roc_auc}")

# Plot learning curve
train_sizes, train_scores, valid_scores = learning_curve(
    svm_model, X_scaled, y, train_sizes=np.linspace(0.1, 1.0, 10), cv=5
)
plt.plot(train_sizes, np.mean(train_scores, axis=1), label='Training score')
plt.plot(train_sizes, np.mean(valid_scores, axis=1), label='Validation score')
plt.xlabel('Training examples')
plt.ylabel('Score')
plt.legend()
plt.show()

Kernel Functions for SVM

Common Kernel Functions and Their Applications

Kernel functions enable implicit mapping of input data into higher-dimensional spaces
- Facilitates efficient computation of dot products in transformed feature space
- Makes non-linear classification feasible without explicit high-dimensional computations
Linear kernel computes standard dot product in input space
- Suitable for linearly separable data or high-dimensional spaces
- $K(x, y) = x^T y$
Polynomial kernel captures non-linear relationships using polynomial functions
- Useful for problems with interaction features
- $K(x, y) = (γx^T y + r)^d$ , where d controls polynomial degree
Radial Basis Function (RBF) kernel measures similarity based on Euclidean distance
- Effective for various non-linear patterns
- $K(x, y) = exp(-γ||x - y||^2)$ , where γ controls influence of single training example
derived from neural networks
- Applicable to problems with sigmoid-like decision boundaries
- $K(x, y) = tanh(γx^T y + r)$

Kernel Selection and Custom Kernels

Choice of kernel function and parameters significantly impacts SVM model performance
- Affects decision boundary shape and generalization ability
Kernel selection guidelines:
- Linear kernel for high-dimensional data or linearly separable problems
- RBF kernel as general-purpose kernel for unknown data distributions
- Polynomial kernel when interaction features important (natural language processing tasks)
Custom kernel functions can be designed for specific problem domains
- Must satisfy Mercer's theorem to ensure valid kernel matrix
- Example custom kernel for using string similarity measures
Kernel alignment measures similarity between different kernel functions
- Helps select most appropriate kernel for given dataset
- Defined as normalized Frobenius inner product between kernel matrices

Kernel PCA and other kernel-based dimensionality reduction techniques combine with SVM

Improves classification performance in high-dimensional spaces

Example workflow combining Kernel PCA with SVM:

from sklearn.decomposition import KernelPCA
from sklearn.pipeline import Pipeline

# Create pipeline with Kernel PCA and SVM
pipeline = Pipeline([
    ('kpca', KernelPCA(n_components=50, kernel='rbf')),
    ('svm', SVC(kernel='linear'))
])

# Fit pipeline to data
pipeline.fit(X_train, y_train)

# Make predictions
predictions = pipeline.predict(X_test)

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

Stay Connected

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Glossary

You have 3 free guides left 😟

You have 3 free guides left 😟

8.2 Support Vector Machines (SVM)

SVM Fundamentals

Core Concepts and Margin Maximization

Top images from around the web for Core Concepts and Margin Maximization

Top images from around the web for Core Concepts and Margin Maximization

Mathematical Foundations and Optimization

SVM Implementation

Linear SVM Implementation

Non-linear SVM and Multi-class Classification

SVM Evaluation

Performance Metrics and Validation Techniques

Model Diagnostics and Optimization

Kernel Functions for SVM

Common Kernel Functions and Their Applications

Kernel Selection and Custom Kernels

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

Resources

Stay Connected

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

Resources

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Next