Stochastic Optimization: Casual Notes

最新推荐文章于 2025-03-03 17:18:50 发布

原创最新推荐文章于 2025-03-03 17:18:50 发布 · 396 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#算法

算法同时被 2 个专栏收录

29 篇文章

订阅专栏

Data-Driven Decision Making

22 篇文章

订阅专栏

本文深入探讨了随机优化（SO）的基本概念、理论依据及应用，对比了确定性问题与不确定性问题的处理方法，详细解释了阶段、周期、瞬态模型与稳态模型等关键概念。文中介绍了几种重要的SO参考书籍，并讨论了场景生成、非预期性原则、地平线效应等核心议题，同时提出了三种评估离散化质量的方法。

Currently learning stochastic optimization (SO) theory, I will note important content here. Some book references are:

King & Wallace (2012) Modeling with Stochastic Programming. This one focus on how to model a problem rather than how to solve it. Less math is involved, good for beginner.
Birge, John R., and Francois Louveaux. Introduction to stochastic programming. Springer Science & Business Media, 2011.
Shapiro, Alexander, Darinka Dentcheva, and Andrzej Ruszczyński. Lectures on stochastic programming: modeling and theory. Society for Industrial and Applied Mathematics, 2014.

Why do we need it

The real world problem is never deterministic
Sensitivity Analysis is not a way of handling uncertainty. It’s an analysis of deterministic decision problems.
Parameter estimations are always wrong.

Concepts

Stage: time points where we make decisions after learning something new. Stages must be identified first. Different models follows with different definition of stages.
Period: the time clock. Often defined according to the occurence of random events. For example, consider a facility location problem considering random daily demands in the future. Fixing the location of facilities is one-stage decision, while the random demands occur every day, so it is about multi-period.
Transient modeling vs Steady-state modeling. The former mean we make decision now based on what we have now. While the latter provide all possible decisions on all possible scenarios. SO is about the former one. For the later, I’d call it ‘anticipativity’, it’s treated by Stochastic DP…
Time line: a SO modeling is not clear until the time line is drawn:
1st stage decision (here and now) -> some uncertainty disclosed -> 2nd stage decision (wait and see)-> some uncertainty disclosed…

Stages

As i understood, the basic form of a two-stage problem:
optimize $E_{\xi}(\min y(w,x) + q(w))$
where $x$ is the first-stage decision, while $w∈ξw\in \xi$ is a random event. The objective is to find the optimal FIRST-STAGE decision resulting the best expected profit/cost.

Inherently two-stage: first stage is long term investment and the second stage is short term usage of this investment. Two stage, two different decision types.
Inherently multi-stage: many stages, same type of decision.
Two stage or multi-stage? It depends on the modeler. If the decision is made in a rolling horizon manner, then instead of make a multi-stage model, we may use two-stage as a simplification. Later stages are aggregated since we do not need so much detail.
Non-anticipativity: if two scenarios are indistinguishable before time t, then the action taken at time t on each scenario must be the same.
Horizon effect in multistage SO: some multistage problem has infinite time horizon. When the horizon is long, we are approaching a steady-state, but in SO we only care about the transient decision, so we have to represent the steady state in the model in order to obtain the right transient behavior. We don’t really care what to do far in the future.

To deal with horizon effect, King & Wallace (2012) presented what they call Dual Equilibrium as a tool to consider infinite stead-stage effect into the SO model. Check chapter 2.3.6.

Scenario generation

A SO is often solved in its deterministic form based on a number of scenarios generated to describe the uncertainty. To do this, firstly we must generate scenarios correctly.

It’s important to realize, that we pass from random variables to discretization because of the algorithm that we are choosing. So it’s important to ensure that the discretization is not too far from reality W.R.T the ALGORITHM!

Algorithms that do not need scenarios as input

stochastic decomposition. Need very efficient implementation and only for linear programs.
stochastic quasi-gradients
importance sampling

Use scenarios trees as input
Two problems:

A small number of scenarios => bad result. Quality depends on the randomness of generation.
A great number of scenarios => too large to be solved…

Where to sample from : if you do not have reliable information on the true distribution, just use the empirical one !

What is a good discretization?
It depends on the model! Our aim is not to approximate the real distribution, instead, we want the algorithm to feel like using the real distribution.

Approach 1

in-sample stability: $f(xi∗,Ti)≈f(xj∗,Tj)f(x^*_i,\Tau_i)\approx f(x^*_j,\Tau_j)$
out-of-sample stability: $f(xi∗,ξ)≈f(xj∗,ξ)f(x^*_i,\xi)\approx f(x^*_j,\xi)$ . The rhs can be computed by a simulation model with much more scenarios, or simply, by testing $f(xi∗,Tj)≈f(xj∗,Ti)f(x^*_i,\Tau_j)\approx f(x^*_j,\Tau_i)$
Bias. It can be both in and out sample stable, but bad… This should be tested by statistical methods, i.e. evaluating the quality of solution, in addition to the stability.

Approach 2
Replicate the distribution by respecting some important properties like: first moment, second moment, third moment… Use some regression methods to do this or use the iterative procedure provided by King & Wallace (2012).
Approach 3
Generating scenarios by minimizing the distance between the generated one and the real one. This is called scenario reduction methods. It’s used when we somehow know the distribution but want to use minimum number of scenarios to represent it. This must integrate an optimization procedure into the scenario generation module.

Optimality gap estimators

Use the expectation of solutions on many trees as an estimator of the expectation of solutions over the true distribution, some bound results can be derived.