Numerical simulation of SDEs#

Star Issue Watch Follow

This is a reproduction of certain scripts found in Higham, An algorithmic Introduction to Numerical Simulation of Stochastic Differential Equations.[Higham, 2001] This paper is an accessible introduction to SDEs, which is centered around ten scripts. Below are reproductions of these scripts (excluding two on linear stability) and some supplementary notes.

Why stochastic differential equations#

We are often interested in modelling a system whose state takes values in a continuous range, and over a continuous time domain. Whereas ordinary differential equations (ODEs) describe variables which change according to a deterministic rule, SDEs describe variables whose change is governed partly by a deterministic component and partly by a stochastic component. SDEs are therefore an appropriate model for systems whose dynamics involve some true randomness, or some fine grained complexity which we cannot afford to or do not wish to model.

The Wiener process#

In order to define the stochastic component of the transition rule of a stochastic system, we must define an appropriate noise model. The Wiener process is a stochastic process that is commonly used for this purpose.

Definition 146 (Wiener process)

A standard Wiener process over [0, T] is a random variable W(t) that depends continuously on t[0,T] and satisfies:

  1. W(0)=0, with probability 1.

  2. For 0t1<t2T the random variable W(t2)W(t1) has distribution N(0,t2t1).

  3. For 0t1<t2<t3<t4T, then W(t2)W(t1)W(t4)W(t3) are independent.

We can imagine the Wiener process as the path followed by a particle that experiences infinitely many, infinitesimal kicks. The size of these kicks, W(t2)W(t1), diminishes as the interval between them, t2t1, diminishes. The kicks are also independent from each other, so the future path of the particle is independent of its past path given its present position.

Sampling from a Wiener process#

Before using the Wiener process to define an SDE, let’s look at the process itself. How can we draw a sample W(t) from it? Since a sample from the Wiener process takes a random value for every t[0,), the best we can do on a computer is to sample the process at a finite subset of time instances. We specify the times t1<t2<...<tN at which to sample W(t), and then use the definition of the Wiener process to see that we should sample as follows:

W(tn+1)=W(tn)+ΔW(tn), where ΔW(tn)N(0,tn+1tn),

where W(t0)=W(0)=0. We can therefore sample all the ΔW(tn) independently, and take their cumulative sum to compute W(tn), as shown below.

Hide code cell source
# Set random seed
np.random.seed(0)

# Integration parameters
T = 1
N = 500
dt = T / N

# Times to sample at
t = np.linspace(0, T, N)[:, None]

# Sample dW's and compute cumulative sum
dW = dt ** 0.5 * np.random.normal(size=(N - 1, 1))
W = np.concatenate([[0], np.cumsum(dW)])
../../../_images/849a977312ae440f760b8c5f732c2112e1663eefe90fb571e08f718340565de5.svg

So even though we can’t represent the entirety of the path, we can sample it to arbitrary precision.

Function of a Wiener process#

Suppose we are interested in a stochastic process which is a function of a Wiener process, say

X(t)=exp(t+12W(t)).

In this case we can sample X(t) by first sampling W(t), and then computing the corresponding values of X(t). We’ll draw S = 1000 samples of X(t), compute their mean, standard deviation and show three of these samples.

Hide code cell source
# Set random seed 
np.random.seed(0)

# Time to simulate for, discretisation level and number of paths
T = 1
N = 500
S = 1000
dt = T / N

# Times to sample at
t = np.linspace(0, T, N)

# Sample dW's and compute cumulative sum
dW = dt ** 0.5 * np.random.normal(size=(N - 1, S))
W = np.concatenate(
    [np.zeros(shape=(1, S)), np.cumsum(dW, axis=0)],
    axis=0,
)

# Compute values 
X = np.exp(t[:, None] + 0.5 * W)
X_mean = np.mean(X, axis=1)
X_stdev = np.var(X, axis=1) ** 0.5
../../../_images/d41fdf1e2df3452e758b4a804786d58b3c7fccb8fcd1ab34a3e536c4519b8cd1.svg

Evaluating a stochastic integral#

The next thing that we look at is the evaluation of a stochastic integral. Let’s consider the integral

Y=01W(t) dW(t),

where W(t) is a Wiener process. To evaluate it, we first need to define what the means in the stochastic case. In the deterministic case we use Riemann integral, which is the limit of a discretised sum:

R=abf(t) dt=limNn=0N1f(a+nδt)δt

where δt=baN. We have chosen to evaluate f on the left side of the discretisation bins. In the deterministic case, it does not matter where we evaluate f within a discretisation bin, meaning that the integral

Rλ=limNn=0N1f(a+(n+λ)δt)δt,

does not depend on the choice of λ[0,1] when we take δt0 (provided f is sufficiently well behaved). For stochastic integrals, this does not hold: the choice of where to evaluate the integrand affects the value of the integral, even in the limit δt0 - we will see an example from Higham shortly. We therefore have to make a choice in defining the integral. Two widespread choices are the Ito and the Stratonovich integrals:

abh(t)dW(t)=limNn=0N1h(tn)(W(tn+1)W(tn)), Ito.abh(t)dW(t)=limNn=0N1h(tn+tn+12)(W(tn+1)W(tn)), Stratonovich.

where we have defined tn=a+nδt. While Ito evaluates the integrand on the left side of the discretisation bin (λ=0), Stratonovich evaluates it at the midpoint of the bin (λ=1/2). Our stochastic integral of interest

Y=01Wt dWt,

is equal to the following values under the Ito and Stratonovich definitions:

01Wt dWt={12W(T)212T2 under Ito,12W(T)2 under Stratonovich.
Proof: Evaluating  W(t) dW(t)  under the Ito and Stratonovich integrals

If we use the Ito integral, we have

n=0N1W(tn)(W(tn+1)W(tn))=12n=0N1(W(tn+1)2W(tn)2(W(tn+1)W(tn))2)=12(W(T)2W(0)2n=0N1(W(tn+1)W(tn))2).

The distribution of (W(tn+1)W(tn))2 has mean equal to the second moment of ΔWtn and variance equal to the fourth moment of ΔWtn, which are δt and 3δt2 respectively. Therefore, the sum above is a random variable with mean T and variance O(δt) - where we have used the fact that the summands are independent, so the variance of the sum is the sum of the variances. So in the limit of δt0, the integral converges to 12W(T)212T2 under the Ito definition.

By contrast, if we use the Stratonovich integral, we have

n=0N1W(tn+1+tn2)(W(tn+1)W(tn))=n=0N1(W(tn+1)+W(tn)2+ΔZn)(W(tn+1)W(tn))=n=0N112W(tn+1)212W(tn)2+ΔZn(W(tn+1)W(tn))=12W(T)212W(0)2+n=0N1ΔZn(W(tn+1)W(tn)),

where ΔZnN(0,δt/4). To obtain the first equality above, we used the fact that

p(W(tn+1+tn2)| W(tn+1),W(tn))=p(W(tn+1)|W(tn+1+tn2))p(W(tn+1+tn2)|W(tn))p(W(tn))p(W(tn+1),W(tn)),

and observing that the distribution above has the form of a product of normal distributions over W(tn+1+tn2), we arrive at the result:

p(W(tn+1+tn2)| W(tn+1),W(tn))=N(W(tn+1+tn2);W(tn+1)+W(tn)2,δt4).

Since ΔZn is independent of Wt, the term ΔZn(W(tn+1)W(tn)) has mean 0, and variance δt2/4. Therefore, the sum term has mean 0 and variance O(δt), so in the limit of δt0 the integral converges to 12W(T)2 under the Stratonovich definition.

Ito integral approximation (exact): 1.419 (1.415)
Stratonovich approximation (exact): 1.917 (1.915)

The difference between the Ito and Stratonovich integrals does not vanish as dt0, which you can verify by experimenting with dt above. The choice of definition has implications about the resulting integral (itself a stochastic process), which may be more or less appropriate for different applications. From here onwards we will work with the Ito integral exclusively, although much of the discussion is ammenable to the Stratonovich integral too.

Euler-Maruyama method#

The Euler-Maruyama method is the analoge of the Euler method for deterministic integrals, applied to the stochastic case.

Definition 147 (Euler-Maruyama method)

Given a scalar SDE with drift and diffusion functions f and g

dX(t)=f(X(t))dt+g(X(t))dW(t),

the Euler-Maruyama method approximates X by

Xn+1=Xn+f(Xn)Δt+g(Xn)ΔWn,

where δt>0 is the time step, Xn=X(tn),Wn=W(tn) and tn=nδt.

Let’s look at an implementation of the Euler-Maruyama (EM) method. The euler_maruyama below function takes the drift and diffusion functions f and g, and applies the EM algorithm, from the specified initial conditions. Note that we can sample the dW in advance.

def euler_maruyama(seed, X0, T, N, f, g):
    
    # Set random seed
    np.random.seed(seed)
    
    # Set discretisation, initial value and times at which to evaluate
    dt = T / N
    X = [X0]
    t = np.linspace(0, T, N + 1)
    
    # Sample Wiener process dW's
    dW = dt ** 0.5 * np.random.normal(size=(N,))
    
    for i in range(N):
        
        # Apply Euler-Maruyama at each point in time
        dX = f(X[-1], t[i]) * dt + g(X[-1], t[i]) * dW[i]
        
        # Store the new X
        X.append(X[-1] + dX)
    
    # Compute W to return it at the end
    W = np.concatenate([[0], np.cumsum(dW)])
    
    return t, X, W

Below is the definition of f and g that we will be integrating, namely

f(x,t)=λx,g(x,t)=μx,

known as the Black-Scholes model. This is implemented as a closure, i.e. f_g_black_scholes takes in the appropriate lambda and mu and returns the corresponding f and g.

def f_g_black_scholes(lamda, mu):
    
    def f(X, t):
        return lamda * X
    
    def g(X, t, grad=False):
        return mu if grad else mu * X
    
    return f, g

We choose these drift and diffusion terms because the associated SDE has a closed form solution, with which we can compare our numerical solution. The analytic solution to the Black-Scholes model is

Xt=X0exp[(λ12μ2) t+μW(t)],

which we implement in exact_black_scholes below. Unlike ODEs, whose solution is a unique function, the solution of an SDE depends on the random noise sample W(t). It’s important to remember to share the same W(t) sample between the exact solution and its numerical approximation.

def exact_black_scholes(X0, t, W, lamda, mu):
    return X0 * np.exp((lamda - 0.5 * mu ** 2) * t + mu * W)

We are can now run the EM method and compare it aginst the exact solution below.

../../../_images/249e12571071c3e1cd91679adda32261bcc8bcb4b4adfdb986edd966f3582367.svg

We can change the accuracy of the solution by adjusting N. As a fun aside, what if we try out a different drift term? One nice choice is

f(x,t)=ω cos(ωt),g(x,t)=μx.

In the case μ=0, the solution is the deterministic function Xt=sin(ωt). When μ0, the solution will be perturbed by the gaussian noise.

def f_g_sine(omega, mu):
    
    def f(X, t):
        return omega * np.cos(omega * t)
    
    def g(X, t):
        return mu * X
    
    return f, g
../../../_images/1635afa10559e29787e11cecd39ff147f22c29b149b5517602dec19b625d1cf7.svg

Strong and weak convergence#

Since the choice of the number of bins N of the discretisation affects the accuracy of our method, we are interested in how quickly the approximation converges to the exact solution as a function of N. To do so, we must first define what convergence means in the stochastic case, which leads us to two disctinct notions of convergence, the strong sence and the weak sense.

Definition 148 (Strong convergence)

A method for approximating a stochastic process X(t) is said to have strong order of convergence γ if there exists a constant such that

E|XnX(τn)|CΔtγ

for any fixed τn=nΔt[0,T] and Δt sufficiently small.

Strong convergence refers to the rate of convergence of the approximation Xn to the exact solution X(τn) as Δt0, in expectation. A weaker condition for convergence is rate at which the expected value of the approximation converges to the true expected value, as Δt0, as given below.

Definition 149 (Weak convergence)

A method for approximating a stochastic process X(t) is said to have weak order of convergence γ if there exists a constant such that

|E[Xn]E[X(τn)]|CΔtγ

for any fixed τn=nΔt[0,T] and Δt sufficiently small.

The paper states without proof that, under conditions on f and g, Euler-Maruyama has strong order of convergence 12 and weak order of convergence 1. We do not provide a proof for any of the above statements, but instead evaluate the rate of convergence empirically.

../../../_images/373eb09fe7e0b367dd446804414d7d4d94fb049ce32e958de50e4706f1575eb0.svg

Milstein’s higher order method#

Just as higher order methods for ODEs exist for obtaining refined estimates of the solution, so do methods for SDEs, such as Milstein’s higher order method.

Definition 150 (Milstein’s method)

Given a scalar SDE with drift and diffusion functions f and g

dX(t)=f(X(t))dt+g(X(t))dW(t),

the Milstein method approximates X by

Xj+1=Xj+f(Xj)Δt+g(Xj)ΔWj+12g(Xj)g(Xj)(ΔWj2Δt),

where Δt>0 is the time step, Xj=X(τj),Wj=W(τj) and τj=jΔt.

Milstein’s method achieves a strong convergence rate of 1 and a weak convergence rate of 1.

Let’s first use Milstein’s method to get a single solution of the Black-Scholes model, as we did for EM.

../../../_images/583d3821272c91236d5eca263f578740f48bd55b383489c2a0f508e4b0b8d721.svg

We can also look at the strong and weak rates of convergence for Milstein’s method. Milstein’s method achieves a strong convergence rate of 1 as oposed to the 1/2 strong rate of Euler Maruyama.

../../../_images/ee31bc11ba427d6552a7d23b6ab492cc507f13c47e1d5774c9aaacb2f004dcbd.svg

Stochastic chain rule#

Suppose we want to evaluate a function V() at various X(t), i.e. V(X(t)). If X(t) were a deterministic quantity, such as the solution to an ODE, we could solve for X(t) and plug it into V. Alternatively, we could express the evolution of V itself as a differential equation using the chain rule:

dV=dVdXdX=dVdXf(t)dt, where dX=f(t)dt,

This way, we could solve the following ODE directly for V

dVdt=dVdXf(t).

For an autonomous SDE however the chain rule takes a different form, which under the Ito definition is as follows.

Theorem 126 (Ito’s result for one dimension)

Let Xt be an Ito process given by

dXt=Utdt+HtdWt.

where Ut,Ht are square-integrable processes, and let V(X,t) be a twice continuously differentiable function. Then Yt=V(Xt,t) is again an Ito process and

dYt=Vtdt+VXdXt+122VX2Vt2dt.

If V does not depend on t, we have

dYt=(VXUt+122VX2Ht2)dt+VXHtdWt.

For a more formal definition and proof of Ito’s result see [Oksendal, 1992] (Theorem 4.1.8 and pages 44-48). Below is a short sketch proof, which highlights why the additional term appears in the formula.

Informal argument: Ito’s result for one dimension

Writing the infinitesimal difference in Yt as a Taylor expansion we get

dYt=Vtdt+VXdXt+12[2VX2dXt2+22VXtdtdXt+2Vt2dt2]+o(dt2),

where the o(dtn) notation means that the ratio of the term being ommited, up to and including the infinitesimal dtn goes to 0 as dt0. Now since dXt=Utdt+HtdWt and dWt is of order dt1/2, the last two terms in the square brackets are o(dt3/2) and we can neglect them. The reference provides a formal argument for neglecting these terms, showing that their contribution to the Ito integral has zero mean and a variance that tends to 0 as dt0 - these contributions converge to 0 in mean square. However, the first term in the brackets is of order dt and does not vanish. In particular

dXt2=Ut2dt2+2UtHtdWtdt+Ht2dWt2.

In the above expression, we can neglect the first two terms which are of order dt2 and dt3/2. Again, here the formal argument is that their associated contributions to the Ito integral converge to 0 in mean square. This yields the expression

dYt=VXUtdt+122VX2Ht2dWt2+VXHtdWt.

Now consider the contribution to the Ito integral of the last term, i.e. the sum

n=1N2VX2|XtnHtn2dWtn2=n=1NandWtn2,

where tn=n dt and N=T/dt. This sum has expectation n=1Nandt and it can be shown that its variance goes to 0 as dt0, so the contribution converges to n=1Nandt in mean square and we can write

dYt=(VXUt+122VX2Ht2)dt+VXHtdWt.

With this corrected rule, we can directly integrate the stochastic process for V.

References#

[Hig01]

D.J. Higham. An algorithmic introduction to numerical simulation of stochastic differential equations. SIAM Review, 43(3):525–546, 2001.

[Oks92]

Bernt Oksendal. Stochastic Differential Equations (3rd Ed.): An Introduction with Applications. Springer-Verlag, Berlin, Heidelberg, 1992. ISBN 3387533354.