More on Performance Improvements in Excel 2010

In this second post about our investments in Excel 2010 performance I’m going to wrap up talking about the challenges and opportunities we embraced for this release, specifically on the topics of large data set scenarios, calculation performance and investment in new multi-core features.

Large Data Set Performance

With the introduction of a bigger grid in Excel 2007 and knowing that 64-bit Excel (see this post) would enable our customers to really start taking advantage of all those cells, we took a closer look at our performance on large data sets. We investigated and implemented optimizations around common activities that people do with data, like entering and filling down data, sorting, filtering, and copy/pasting. We tested these scenarios pretty heavily to ensure that these scale as the amount of data gets bigger and bigger, especially beyond 2GB RAM.

We also spent time making sure that we optimized working set (memory usage) to be as efficient as possible. We ran over 300 tests covering basic use scenarios, analyzed working set metrics for each test, and tuned our code to reduce memory footprint. We also compared 64-bit Excel 2010 with the 32-bit version to ensure any increase was justified, and we further compared Excel 2010 with Excel 2007 to spot and fix regressions. This work ensures that Excel only uses the memory it really needs while running your workbooks, and ensures that Excel frees up memory it no longer needs as soon as possible.

Calculation Performance

Multi-threaded calculation introduced in Excel 2007 helped improve calculation performance, and we wanted to move the needle again this release. A significant portion of Excel workbooks are core to business processes, and calculation speed can be a bottleneck for some of our customers. Excel calculation performance improvements are incredibly valuable for our customers, and therefore this is a critical area of investment for us.

Just as we’ve done with the VBA performance area (see my previous post), we’ve collected up a bunch of workbooks from customers who care about calculation performance. We regularly benchmark that suite of workbooks, take a look at the results, see what can be optimized, implement changes, re-test, and so on, improving Excel 2010’s calculation speed. Here’s a snapshot of recent results compared with Excel 2007:


This shows that nearly 70% of the workbooks in our calculation performance test suite are significantly faster than Excel 2007.

There is one other unique investment in the area of calculation that I want to briefly mention: Asynchronous XLL User-Defined Functions (Asynch XLL UDFs). The idea behind Asynch XLL UDFs is this: we know that some of our customers use XLL UDFs as a way to fetch data from external data servers. The CPU usage on the machine running Excel isn’t very heavy, but the server handling the request may take a while to return the requested data. If you have 100 of these functions in A1:A100 for example, you can be waiting quite a while for your workbook to calculate, because Excel processes those cells one at a time. Since the local machine isn’t doing any heavy CPU processing, with Excel 2010 we’ve enabled the functionality to send all 100 XLL UDFs calls at one time, allowing the backend servers to process the requests in parallel.  We’ll have more details about this feature when we delve into programmability improvements at a later date.

New Multi-Core Features

We also spent time adding more multi-core functionality into Excel 2010. As we were researching the scenarios in the investment areas I’ve described so far, we also kept an eye on what features would lend themselves well to parallel processing. After we completed algorithmic tuning and optimizations in those areas, we then went after more gains by implementing multi-core features in select areas:

  • File.Open
  • File.Save
  • PivotTable Refresh (for external data sources, except OLAP and SharePoint Lists)
  • Cell Table Sort
  • PivotTable Sort
  • Column Auto Sizing

The first three areas all involve pushing/pulling data, and we found that it’s best to split it up into two operations: one for fetching the data, the other for processing it into the appropriate structures (or the reverse for the save case). The benefit here is that once the first fetch is complete, the second thread can start processing that data into the workbook while at the same time the first thread gets right back to fetching the next set of data, like it shows in the picture:


Before implementing multi-threaded File.Open, those two processes would be stacked end to end. All of read needed to complete before load into memory started. Now, taking File.Open in Excel 2010 as an example, at “Start” Excel asks for the first part in the package (the workbook part). As soon as that’s off disk and decompressed, Excel tells the first thread to go fetch the next part in the book (probably a sheet), and at the same time starts up a second thread to get the workbook part data loaded into Excel memory structures. You can imagine that the structure and content of your workbook will significantly affect the performance gains. If there is just one huge sheet in the workbook and nothing else, well, it takes almost no time to load the workbook part, and a proportionally large amount of the time will be spent simply loading the one sheet. Not much gained. If you have two huge sheets, then there’s significant gain because Excel can fetch the second sheet off disk while the first sheet is still being loaded into memory.

I timed Excel 2007 vs. Excel 2010 loading a couple workbooks: one with 6 million cells filled with numbers, 3 million on sheet1 and 3 million on sheet2, and another workbook with 12 million cells filled with numbers, 6 million on sheet1 and 6 million on sheet2:


You can see there’s a bit of overhead in getting load set up and pulling that first part off disk, but these are some really nice gains, especially for such a common operation as opening workbooks. PivotTable Refresh works similarly, except the data is requested in same-sized packets until all rows have been retrieved.

We also wanted to improve the performance of common operations that can take a long time due to the size of the data, which includes cell table sort (both with and without a Table being applied to the range), PivotTable sort, and column auto sizing. We specifically chose column auto sizing because this operation was showing up as a “long running” operation in several of our scenarios, including PivotTable Refresh, QueryTable refresh, and user-initiated column auto fit.

I hope you all will be able to enjoy and benefit from a snappy and speedy Excel 2010!