*Todayâ€™s author: Christian Stich, a program manager on the Excel Services team who likes to combine his finance and engineering backgrounds. Christian is going to talk about building a multithreaded UDF for Monte Carlo simulations and using it for options pricing.
*

The example code and workbook in this post demonstrate the implementation of a European Put/Call Option valuation model using a high performance Monte Carlo simulation. Readers with a finance background realize that European options can be valued using the Black-Scholes options pricing model â€“ the sample workbook also includes a calculation of the options values using Black-Scholes for comparison. This example is only meant to demonstrate the approach and implementation and does not intent to solve options valuation in the most efficient manner. The same basic Monte Carlo approach can be extended to price all kinds of derivatives/options (including American and Asian options as well as exotic options); moreover, the underlying distributions are typically not normal/log-normal. This is due to the fact that extreme price fluctuations are not as rare in reality as predicted by normal/log-normal distributions. Financial options are extremely important in risk management and mitigation, hedging and arbitrage and thus accurate and rapid pricing of options is very desirable.

The source code for the UDF and the workbooks can be downloaded at:

http://officeblogs.net/excel/MonteCarlo-Source Code+Workbooks.zip__
__

**Why use Monte Carlo Simulations?
**

Monte Carlo simulations are extremely useful in those cases where no closed form solutions to a problem exist. In some of those cases approximations to the solution exist; however, often the approximations do not have sufficient accuracy over the entire range. The power of Monte Carlo simulations becomes obvious when the underlying distributions do not have an analytical solution. Monte Carlo simulations are extensively used in the physical and life sciences, engineering, financial modeling and many other areas. In fact, significant amounts of processing times on supercomputers across the globe are used for Monte Carlo simulations.

**Advantages of using Excel and Excel Services for financial, engineering or scientific simulations
**

Excel and Excel Services provide a great platform for modeling of complex scenarios. The UDF is used to generate the inverse cumulative distribution values which are used to calculate the options values using formulas in the Excel workbook. Keeping the formulas for the option pricing model in Excel (rather than incorporating them into the UDF) provides a number of advantages. It allows subject matter experts (in this case financial analysts) to easily and quickly modify the model without having to resort to coding in C# or other programming languages. Moreover, the fact that the UDF source code does not need to be shared helps protect companiesâ€™ intellectual property. This is also very important to 3^{rd} party companies that provide plug-in solutions for Excel and Excel Services. Similarly, the fact that access to the models in the Excel Services workbooks can be tightly controlled and is not required for the coding of the UDF also helps the companiesâ€™ safeguard intellectual property.

**Advantages of using UDFs with Excel Services
**

UDFs allow programmers to write code that is compiled into machine language and not interpreted/evaluated, resulting in potentially significant performance gains. UDFs are well suited for implementations of Monte Carlo simulations as they are very computationally intensive. Data transfer between UDFs and Excel Services is very fast â€“ depending on the server hardware several hundred thousand cells in the workbook can be read and written by the UDF per second. This allows creating models in Excel Services workbooks that exchange large amounts of data with the UDF without incurring a heavy penalty due to the data transfer.

**Advantages of multithreading
**

CPU manufacturers are rapidly moving toward multi-core CPU architectures, increasing performance through parallelism in addition to the continuing increases in processing speeds. Todayâ€™s mainstream CPUs are dual- or quad-core with hyperthreading whereas a few years ago most CPUs contained a single non-hyperthreaded core. The number of processing cores is expected to rise steeply in the near future â€“ harnessing the processing power of those CPUs requires programs to be multithreaded to make efficient use of the parallelism. Furthermore, servers are nowadays available with more than two CPUs, each of which contains multiple hyperthreaded cores itself. The approach taken in this example Monte Carlo simulation is the essentially a non-recursive divide and conquer algorithm. The simulation is divided into a number of chunks which are then calculated on the different processing cores in parallel. This type of simulation is very well suited to be parallelized as each individual part of the simulation is independent.

**The code
**

##
##
##
## For additional information about using UDFs with Excel Services take a look at Shahar Prishâ€™s How UDFs work in Excel Services – a Primer, Danny Khenâ€™s posts on UDFs – Excel 2007 investments in UDFs #1 & Excel 2007 investments in UDFs #2: Existing UDFs, and Excel 2007 investments in UDFs #3: client/server solution using a core library.

##
## For additional information about using UDFs with Excel Services take a look at Shahar Prishâ€™s How UDFs work in Excel Services – a Primer, Danny Khenâ€™s posts on UDFs – Excel 2007 investments in UDFs #1 & Excel 2007 investments in UDFs #2: Existing UDFs, and Excel 2007 investments in UDFs #3: client/server solution using a core library.

## Overview of the code

## The code consists of a number of declarations and four methods. The four methods are MonteCarloNormalDist, CalculateDistribution, MultithreadedCalculateDistribution, and CalculateDistributionThread. The MonteCarloNormalDist methods purpose is to act as the interface between Excel Services and the actual Monte Carlo simulation which resides in the remaining methods. The CalculateDistribution method splits the simulation into a number of equal sized chunks and calls on the MultithreadedCalculateDistribution method. The MultithreadedCalculateDistribution method itself calls the CalculateDistributionThread method which performs the actual Monte Carlo simulation.

**Declarations
**

First we create a new class which we call MonteCarlo. We mark the class as a UDF class in order to make it available to Excel Services. We also define a class called ThreadWorkItem â€“ it contains the variables for the individual worker threads (more about those further down in the code).

using System;

using System.Threading;

using System.Collections.Generic;

using System.Text;

using Microsoft.Office.Excel.Server.Udf;

namespace MonteCarlo

{

[UdfClass]

public class MonteCarloClass

{

public class ThreadWorkItem

{

public double[] Result;

public int thread;

public int count;

public int iter;

public double mean;

public double stdDev;

public int distribution;

public double minVal;

public double maxVal;

public ManualResetEvent ManualEvent;

}

**MonteCarloNormalDist
**

Next we implement the MonteCarloNormalDist method and mark it as a UDF method and as volatile. This method supports array formulas as it determines the number of cells that are passed to the UDF. It then calls the CalculateDistribution method with the number of cells, desired number of threads, and the parameters that define the normal or log-normal distribution. If a cumulative distribution function is to be returned the method adds the values of the probability density function in order to arrive at the cumulative distribution function. Furthermore, this method can also generate an inverse cumulative distribution distribution which is needed for our options pricing model. Finally, the MonteCarloNormalDist method returns in calculated values to Excel Services.

[UdfMethod(IsVolatile = true)]

public object[,] MonteCarloNormalDist(

object[,] cellArray,

int iterations,

int threads,

double mean,

double stdDev,

int distribution,

double minVal,

double maxVal)

{

// get size of array

int cellArrayCount = cellArray.GetLength(0);

int i = 0;

int j = 0;

// get number of processor cores present in the system

int processorCores = Environment.ProcessorCount;

object[,] result = new object[cellArrayCount, 1];

double[] tmp = new double[cellArrayCount];

double cumulativeSum = 0.0;

double stepVal = 0.0;

double curVal = 0.0;

if (iterations > 0)

{

// round UP to nearest even number

if (iterations % 2 != 0)

{

iterations++;

}

// run number of threads as specified upto number of ‘cores’

// use “if (threads == 0)” in order to allow for more threads than ‘cores’

if (threads == 0 || threads > processorCores)

{

// 0 threads indicated default -> use number of ‘cores’

threads = processorCores;

}

tmp = CalculateDistribution(

cellArrayCount,

iterations,

threads,

mean,

stdDev,

distribution,

minVal,

maxVal);

// bit 0: 0 -> normal, 1 -> lognormal

// bit 1: 0 -> probability, 1 -> cumulative

// bit 2: 0 -> ‘regular’, 1 -> inverted cumulative

for (i = 0; i < cellArrayCount; i++)

{

if ((distribution & 0x6) != 0)

{

// cumulative distribution function

cumulativeSum = cumulativeSum + tmp[i];

result[i, 0] = cumulativeSum / (double)iterations;

}

else

{

// probability density function

// tmp[i] = tmp[i] / (double)iterations * (double)cellArrayCount / (maxVal – minVal);

tmp[i] /= (double)iterations;

tmp[i] *= (double)cellArrayCount;

tmp[i] /= (maxVal – minVal);

result[i, 0] = tmp[i];

}

}

}

if ((distribution & 0x4) != 0)

{

// inverse cumulative

// divide the cumulative sum into cellArrayCount slices

stepVal = (double)result[cellArrayCount-1,0] / (double)(cellArrayCount + 1);

curVal = stepVal / 2.0;

j = 0;

i = 0;

tmp[j++] = 0.0;

// invert the distribution

while (i < cellArrayCount)

{

if ((double)result[i, 0] < curVal)

{

i++;

}

else

{

j++;

if (j < cellArrayCount)

{

tmp[j] = i;

}

curVal = curVal + stepVal;

}

}

tmp[cellArrayCount – 1] = cellArrayCount;

for (i = 0; i < cellArrayCount; i++)

{

// tmp[i] = tmp[i] / (double)cellArrayCount * (maxVal – minVal) + minVal;

tmp[i] /= (double)cellArrayCount;

tmp[i] *= (maxVal – minVal);

tmp[i] += minVal;

result[i, 0] = tmp[i];

}

}

return result;

}

**CalculateDistribution
**

The following section of the code contains the CalculateDistribution method. Its purpose is to divide the simulation into equal sized chunks. The â€˜lastâ€™ thread is assigned a few more iterations if the number of iterations does not evenly divide by the number of threads. The method also sets the variables for the individual worker threads prior to spawning the multiple threads. Subsequently CalculateDistribution waits for all worker threads to complete and then aggregates the results from those threads.

public double[] CalculateDistribution(

int cellArraycount,

int iterations,

int threads,

double mean,

double stdDev,

int distribution,

double minVal,

double maxVal)

{

int i = 0;

int j = 0;

int remainingIterations = iterations;

int iterationsPerThread = 0;

double[] tmp = new double[cellArraycount];

ThreadWorkItem[] twi = new ThreadWorkItem[threads];

ManualResetEvent[] mre = new ManualResetEvent[threads];

// iterations divided by the number of threads

iterationsPerThread = iterations / threads;

// set variables for worker threads

for (i = 0; i < threads; i++)

{

twi[i] = new ThreadWorkItem();

twi[i].thread = i;

twi[i].count = cellArraycount;

if (i != threads – 1)

{

if (remainingIterations >= iterationsPerThread)

{

// assign iterations to the all but the last thread

twi[i].iter = iterationsPerThread;

}

else

{ // threads will only get 0 iterations if the number of iterations is less than 2*threads

twi[i].iter = 0;

}

// subtract the assigned iterations from the remaining iterations

remainingIterations = remainingIterations – twi[i].iter;

}

else

{

// assign “left over” iterations to the last thread

twi[i].iter = remainingIterations;

}

twi[i].ManualEvent = new ManualResetEvent(false);

twi[i].mean = mean;

twi[i].stdDev = stdDev;

twi[i].distribution = distribution;

twi[i].minVal = minVal;

twi[i].maxVal = maxVal;

}

// create new threads and wait for their completion

for (i = 0; i < threads; i++)

{

ThreadPool.QueueUserWorkItem(

new WaitCallback(MultiThreadedCalculateDistribution),

twi[i]);

}

for (i = 0; i < threads; i++)

{

mre[i] = twi[i].ManualEvent;

}

WaitHandle.WaitAll(mre);

// aggregate returned values from the individual worker threads

for (j = 0; j < cellArraycount; j++)

{

for (i = 0; i < threads; i++)

{

tmp[j] += twi[i].Result[j];

}

}

return tmp;

}

**MultithreadedCalculateDistribution and CalculationDistributionThread
**

This section of the code contains the actual parallel Monte Carlo implementation. The CalculateDistributionThread method uses the Box-Muller transform in order to efficiently calculate numbers that are normally or log-normally distributed.

void MultiThreadedCalculateDistribution(object state)

{

ThreadWorkItem twi = (ThreadWorkItem)state;

twi.Result = CalculateDistributionThread(

twi.thread,

twi.count,

twi.iter,

twi.mean,

twi.stdDev,

twi.distribution,

twi.minVal,

twi.maxVal);

twi.ManualEvent.Set();

}

double[] CalculateDistributionThread(

int thread,

int count,

int iter,

double mean,

double stdDev,

int distribution,

double minVal,

double maxVal)

{

int i = 0;

int index = 0;

double x1 = 0.0;

double x2 = 0.0;

double y1 = 0.0;

double y2 = 0.0;

double z = 0.0;

double range = maxVal – minVal;

double rangeCount = (double)(count) / range;

double[] tempArray = new double[count];

// to allow timer to advance for new random seed add “Thread.Sleep(1*thread);”

// also, depending on the requirements for the simulation a different method for

// generating the seed or a different random number generator should be used

Random rnd = new Random(thread);

// this thread may have 0 iterations, if overall number of iterations is very small

if (iter > 0)

{

// 2 random numbers are generated during each loop

for (i = 0; i < iter; i = i + 2)

{

do

{

x1 = 1.0 – rnd.NextDouble() * 2.0;

x2 = 1.0 – rnd.NextDouble() * 2.0;

z = x1 * x1 + x2 * x2;

} while ((z >= 1.0) || (z == 0.0));

// 1 – Pi/4 random numbers are discarded but this polar transform does not

// require trigonometric functions as does the cartesian version

z = Math.Sqrt(2.0 * (-Math.Log(z)) / z);

y1 = x1 * z;

y2 = x2 * z;

// divide by standard deviation and add the mean

y1 = y1 * stdDev + mean;

y2 = y2 * stdDev + mean;

// if log-normal distribution is to be returned

if ((distribution & 0x1) != 0)

{

y1 = Math.Exp(y1);

y2 = Math.Exp(y2);

}

// first random number

index = (int)Math.Round((y1 – minVal) * rangeCount);

// discard values outside the range

if (index >= 0 & index < count)

{

tempArray[index] = tempArray[index] + 1.0;

}

// second random number

index = (int)Math.Round((y2 – minVal) * rangeCount);

// discard values outside the range

if (index >= 0 & index < count)

{

tempArray[index] += 1.0;

}

}

}

return tempArray;

}

}

}

**Testing the UDF using an Excel Services workbook
**

We now want to verify that the UDF properly calculates the requested distribution functions. An easy and visual way of verifying the functionality is to use Excel Servicesâ€™ charting capabilities. The attached workbook contains a UDF array formula spanning 1,000 cells. The values returned by the UDF are plotted on an Excel X-Y scatter chart. We decide to plot the standard normal distribution probability density function with mean=0 and standard deviation=1 over the range from -3 to +3. Initially we use 50,000 iterations and we set the number of threads equal to the number of processing cores. Since we only used 50,000 iterations (resulting in an â€œaverageâ€ of 50 iterations per calculated cell), the distribution is not â€œsmoothâ€ as can be seen below.

We now increase the number of iterations to 50 million. Running the simulation may take less than 1 second to a few seconds depending on the number and speed of the processing cores present in the server. We can immediately see that the distribution is now â€œsmooth.â€ As mentioned previously, Monte Carlo simulations typically require a very large number of iterations in order to arrive at the desired accuracy.

Finally we want to verify that the inverse cumulative normal distribution function works as it is required for the European options pricing model. The chart below shows the generated X-Y scatter chart.

**Implementing the European options pricing model using an Excel Services workbook in combination with the UDF
**

Now we are ready to implement the European options pricing model. The attached workbook contains a UDF array formula spanning 100,000 cells that returns the inverse cumulative normal distribution. This distribution is then used to calculate the values for the European Call and Put options.

Again we are using all of the processor cores (setting Threads=0). With 50,000 iterations the Monte Carlo simulation results are within 0.06%for the Call option and within 3.4% for the Put option of the results achieved using the Black-Scholes option pricing model.

Now we are going to use 1 billion iterations. The Monte Carlo simulation results are within 0.0004% for the Call option and within 0.02% for the Put option of the results achieved using the Black-Scholes option pricing model. As stated above, it makes sense to use the Black-Scholes option pricing model for European options, however, Monte Carlo simulations are a good choice for other types of options. Moreover, as can be seen from this example, large increases in the number of iterations are required in order to significantly increase the precision of the result. Thus, multithreading Monte Carlo simulations very effectively decreases the required simulation times.

**Performance Improvements
**

The operating system and other applications are also using CPU processing time at the same time that the Monte Carlo simulation is running. This results in an uneven load on the different processing cores, resulting in those threads which are running on processing cores that have less of a load from other sources finishing sooner than those on heavily loaded processing cores. The goal for shortest simulation times should be to use all CPU resources evenly â€“ with all in parallel running threads finishing more or less simultaneously. The implementation in the given example works well if the server is lightly loaded due to the fact that the processing times of

Approaches to even out CPU core usage include the use of a dispenser and increasing the number of threads beyond the number of processing cores.

**Increasing the number of jobs to be much larger than the number of cores:
**

For this scenario we would simply increase the number of jobs beyond the number of processing cores. This way the systemâ€™s thread pool would effectively act as a dispenser, creating threads as processing cores become available. The advantage of this approach is that it works well if the speed with which the processing cores process the chunks varies. The difference in the time required to process a chuck is due processing requirements of other applications and the operating system and/or due to variability in processing complexity (unlike in this code example where each chunk takes the same CPU time to process). The disadvantage of this approach is the additional overhead introduced due to the increased number of jobs.

**Using a dispenser:
**

In this scenario the number of threads would be equal (or less) than the number of processing cores. However, instead of dividing the simulation into n large chucks (n = number of processing cores) as in the provided coding example one would divide the simulation into a much larger number of smaller chunks. The dispenser would hand out chunks to be processed to the individual worker threads. Once a thread completes its work on a chuck it would request another chunk from the dispenser to work on. This approach provides the same advantage as increasing the number of jobs. The disadvantage of this approach is the added complexity and the overhead introduced due to the use of the dispenser.

**Summary
**

Using Excel Services in conjunction with User Defined Functions allows high performance modeling of complex problems. Offloading the Monte Carlo simulation to the UDF allows for very efficient usage of CPU resources â€“ resulting in very high simulation rates, yet keeping the actual model in Excel results in an easy to maintain and modify model. The approach shown can be adapted to much more complex models in finance (options pricing, risk modeling, etc.) and also lends itself well to modeling in a large number of very diverse subject areas ranging from the physical sciences and various engineering disciplines, life sciences, social sciences to project management and more.