You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

23.2 Stochastic optimization techniques

3 min readjuly 19, 2024

tackles problems with uncertain variables, aiming to maximize or minimize expected outcomes. It's crucial in fields like finance and logistics, where randomness plays a big role in decision-making.

Techniques like help solve these problems efficiently. By using random subsets of data, these methods can handle large-scale issues quickly, though they may sacrifice some accuracy for speed.

Stochastic Optimization Techniques

Formulation of stochastic optimization problems

Top images from around the web for Formulation of stochastic optimization problems
Top images from around the web for Formulation of stochastic optimization problems
  • Stochastic optimization problems involve objective functions that depend on random variables (demand uncertainty) and constraints that may involve random variables (resource availability)
    • Goal is to minimize or maximize the of the objective function (expected profit or cost)
  • Problem formulation requires defining decision variables (production quantities), specifying the objective function (may involve expectations, probabilities, or moments of random variables), and identifying constraints
    • are fixed and known with certainty (budget limits)
    • , also known as , hold with a certain probability (meeting demand with 95% confidence)
  • Examples of stochastic optimization problems include with uncertain returns (stock prices), with stochastic demand (retail sales), and under uncertainty (project scheduling)

Implementation of stochastic gradient descent

  • Stochastic gradient descent (SGD) is an iterative optimization algorithm suitable for large-scale problems with many training examples (image classification datasets)
  • Algorithm steps:
    1. Initialize the model parameters (weights and biases)
    2. Repeat until convergence:
      • Randomly select a subset (mini-batch) of training examples (64 images)
      • Compute the gradient of the objective function with respect to the model parameters using the mini-batch (backpropagation)
      • Update the model parameters using the computed gradient and a (weight update rule)
  • Learning rate determines the step size in the parameter update and can be fixed or adaptive, decreasing over time (learning rate decay)
  • is the number of training examples used in each iteration, representing a trade-off between computation efficiency and convergence speed (larger batches converge slower but are more computationally efficient)
  • Variants of SGD include (accelerates convergence by considering previous gradients), (adapts the learning rate for each parameter based on historical gradients), and (combines momentum and adaptive learning rates)

Convergence of stochastic algorithms

  • studies the behavior of the optimization algorithm as the number of iterations increases, aiming to establish convergence guarantees and rates
  • and constraints simplify convergence analysis, and SGD converges to the global optimum for convex problems (linear regression)
  • For (deep neural networks), convergence to a global optimum is not guaranteed, and SGD may converge to a local optimum or a saddle point
  • Convergence rates:
    • : O(1/k)O(1/\sqrt{k}) for general convex problems
    • : O(1/k)O(1/k) for strongly convex problems, where kk denotes the number of iterations
  • Factors affecting convergence include the learning rate schedule (constant vs. adaptive), mini-batch size (larger batches may slow convergence), and noise in the gradients (stochastic vs. batch gradients)

Comparison of stochastic optimization techniques

  • Stochastic gradient descent (SGD) is simple, widely used, and suitable for large-scale problems but may require careful tuning of learning rate and mini-batch size
  • (SAG) maintains a running average of the gradients, leading to faster convergence than SGD for strongly convex problems but has higher memory requirements due to storing gradients
  • (SVRG) reduces the variance of the gradients by using a full gradient computation periodically, resulting in faster convergence than SGD and SAG for strongly convex problems but incurs additional computational overhead for the full gradient computation
  • (SDCA) operates in the dual space and is suitable for problems with sparse data (text classification), achieving linear convergence for strongly convex problems
  • Trade-offs between stochastic optimization techniques include convergence speed vs. computational complexity, memory requirements (storing gradients), and sensitivity to hyperparameters (learning rate, mini-batch size)
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary