You can use your favorite social network to register or link an existing account:
Or use your email address to register without a social network:
Sign in with these social networks:
Or enter your username and password
Forgot your password?
Yes, please link my existing account with for quick, secure access.
No, I would like to create a new account with my profile information.
In this second post about our investments in Excel 2010 performance I’m going to wrap up talking about the challenges and opportunities we embraced for this release, specifically on the topics of large data set scenarios, calculation performance and investment in new multi-core features.
With the introduction of a bigger grid in Excel 2007 and knowing that 64-bit Excel (see this post) would enable our customers to really start taking advantage of all those cells, we took a closer look at our performance on large data sets. We investigated and implemented optimizations around common activities that people do with data, like entering and filling down data, sorting, filtering, and copy/pasting. We tested these scenarios pretty heavily to ensure that these scale as the amount of data gets bigger and bigger, especially beyond 2GB RAM.
We also spent time making sure that we optimized working set (memory usage) to be as efficient as possible. We ran over 300 tests covering basic use scenarios, analyzed working set metrics for each test, and tuned our code to reduce memory footprint. We also compared 64-bit Excel 2010 with the 32-bit version to ensure any increase was justified, and we further compared Excel 2010 with Excel 2007 to spot and fix regressions. This work ensures that Excel only uses the memory it really needs while running your workbooks, and ensures that Excel frees up memory it no longer needs as soon as possible.
Multi-threaded calculation introduced in Excel 2007 helped improve calculation performance, and we wanted to move the needle again this release. A significant portion of Excel workbooks are core to business processes, and calculation speed can be a bottleneck for some of our customers. Excel calculation performance improvements are incredibly valuable for our customers, and therefore this is a critical area of investment for us.
Just as we’ve done with the VBA performance area (see my previous post), we’ve collected up a bunch of workbooks from customers who care about calculation performance. We regularly benchmark that suite of workbooks, take a look at the results, see what can be optimized, implement changes, re-test, and so on, improving Excel 2010’s calculation speed. Here’s a snapshot of recent results compared with Excel 2007:
This shows that nearly 70% of the workbooks in our calculation performance test suite are significantly faster than Excel 2007.
There is one other unique investment in the area of calculation that I want to briefly mention: Asynchronous XLL User-Defined Functions (Asynch XLL UDFs). The idea behind Asynch XLL UDFs is this: we know that some of our customers use XLL UDFs as a way to fetch data from external data servers. The CPU usage on the machine running Excel isn’t very heavy, but the server handling the request may take a while to return the requested data. If you have 100 of these functions in A1:A100 for example, you can be waiting quite a while for your workbook to calculate, because Excel processes those cells one at a time. Since the local machine isn’t doing any heavy CPU processing, with Excel 2010 we’ve enabled the functionality to send all 100 XLL UDFs calls at one time, allowing the backend servers to process the requests in parallel. We’ll have more details about this feature when we delve into programmability improvements at a later date.
We also spent time adding more multi-core functionality into Excel 2010. As we were researching the scenarios in the investment areas I’ve described so far, we also kept an eye on what features would lend themselves well to parallel processing. After we completed algorithmic tuning and optimizations in those areas, we then went after more gains by implementing multi-core features in select areas:
The first three areas all involve pushing/pulling data, and we found that it’s best to split it up into two operations: one for fetching the data, the other for processing it into the appropriate structures (or the reverse for the save case). The benefit here is that once the first fetch is complete, the second thread can start processing that data into the workbook while at the same time the first thread gets right back to fetching the next set of data, like it shows in the picture:
Before implementing multi-threaded File.Open, those two processes would be stacked end to end. All of read needed to complete before load into memory started. Now, taking File.Open in Excel 2010 as an example, at “Start” Excel asks for the first part in the package (the workbook part). As soon as that’s off disk and decompressed, Excel tells the first thread to go fetch the next part in the book (probably a sheet), and at the same time starts up a second thread to get the workbook part data loaded into Excel memory structures. You can imagine that the structure and content of your workbook will significantly affect the performance gains. If there is just one huge sheet in the workbook and nothing else, well, it takes almost no time to load the workbook part, and a proportionally large amount of the time will be spent simply loading the one sheet. Not much gained. If you have two huge sheets, then there’s significant gain because Excel can fetch the second sheet off disk while the first sheet is still being loaded into memory.
I timed Excel 2007 vs. Excel 2010 loading a couple workbooks: one with 6 million cells filled with numbers, 3 million on sheet1 and 3 million on sheet2, and another workbook with 12 million cells filled with numbers, 6 million on sheet1 and 6 million on sheet2:
You can see there’s a bit of overhead in getting load set up and pulling that first part off disk, but these are some really nice gains, especially for such a common operation as opening workbooks. PivotTable Refresh works similarly, except the data is requested in same-sized packets until all rows have been retrieved.
We also wanted to improve the performance of common operations that can take a long time due to the size of the data, which includes cell table sort (both with and without a Table being applied to the range), PivotTable sort, and column auto sizing. We specifically chose column auto sizing because this operation was showing up as a “long running” operation in several of our scenarios, including PivotTable Refresh, QueryTable refresh, and user-initiated column auto fit.
I hope you all will be able to enjoy and benefit from a snappy and speedy Excel 2010!
Comments: (10) Collapse
So you messed up 2007 and now you have corrected 2010 to be as fast as 2003.... Can we please see comparisons between 2003 and 2010.
One basic improvement I'd like to see is smaller file sizes. In some circumstances (emailed attachments, for example) file size still matters and Excel 2007 files are much too big. .xlsx is a zipped format, but zipping some of my .xlsx files will reduce them to less than a quarter if their size. Of course there is a size/performance trade-off to some extent when compressing a file, but better size management has a place as well as performance improvements. Most Excel apps do not have a major performance problem.
The performance gain for both examples given is broadly speaking the same ratio - twice as many cells take twice as long to open in both versions.
The absolute time saved is obviously greater with larger numbers of cells, which is all good news, of course.
I would be interested to see if 2010 gives a substantial advantage opening workbooks with many more sheets, or other objects such as charts and pivot tables (refreshing data on open).
Any chance of a follow up post with a broader range of comparison.
Sam (et al) – We decided to not compare against Excel 2003 because of the significant differences between the two, for example the upgrades we made in architecture to support the bigger grid and the better graphics platform. Excel 2010 generally beats 2003 when it comes to calculation performance and our other multi-core investments. The results are more mixed when it comes to a) the “many shapes on a sheet” scenarios (though many scenarios have been addressed in Excel 2010 to the point where these workbooks are usable again (meaning sub-second response times for cases that were around 10 or 15 seconds)), and b) the VBA solutions category (there are so many solutions out there that touch different areas of Excel and do so many different things). It is fair to say that sometimes your code running in Excel 2010 is faster than running in Excel 2003, in other cases it runs in about the same time as in Excel 2003, and in yet other cases is still slower than in Excel 2003.
I (and several of our readers) have pointed out some performance best practices when writing VBA code (blogs.msdn.com/.../excel-vba-performance-coding-best-practices.aspx). As you can see from reading there, if you’re not careful, it’s easy to hang yourself by writing inefficient, long running code (e.g. iterating through a range of cells, one at a time, updating values). When it comes to writing VBA code to drive Excel, it really depends on what you’re doing, how you’ve written the code to do that, and the speed of the machine you’re running it on.
--Chad
@Chad:
Thank you for your response to Sam (and my late post to part 1 of this article).
Your logic is sound, when it comes to comparing performance for users already using 2007. However, for those of us who have declined to upgrade based on a variety of issues, we are still comparing the speed and efficiency of the workbooks we are using today in 2003 (or earlier) and how those same workbooks would perform if we upgrade to 2010.
I'll figure it out on my own, but a comparison including 2003 would be a better coming from you, because you have a wider variety of test workbooks. I plan to take a number of my 2003 workbooks and test them in 2010 (just like I did with 2007). If performance is noticably poorer, I'm likely to stick with my older version(s) [2003 and 97], regardless of how much 2010 improves on 2007.
My intent is not to devalue 2010- I really hope that it will be my next upgrade. I just hope that your team is benchmarking against your own 'best practice' performance, rather than 'last race' performance, to ensure a win.
Chad -
"We decided to not compare against Excel 2003 because of the significant differences between the two..."
This is a good point, but from the point of view of the user deciding whether to upgrade, each version of Excel is a black box: the user doesn't care about what's under the hood, the user is only interested in how it compares to what he knows.
For example, it's nice that you're using multi-core calculations, but as a user I don't care if you're using faster cockroaches in my PC to direct the electrons, as long as it's fast enough.
It's frustrating when code that already conforms to best practices doesn't perform as well as it previously had. Best practices should not be presented as a way to overcome performance difficulties in a particular version of Excel. They should be presented as a way for programmers to write more efficient code for any version of Excel.
Are we going to get medians in pivot tables?
1. When in manual calc and doing a Shift+F9 sheet calc, mark dirty cells as clean. Currently, only a full app-level calc ever marks dirty cells clean. Imagine a sheet with a "Today's date" off which many calcs depend, then some other entries that only affect a few. If you type in the date once, the entire sheet calcs on every shift+F9.
2. Make TODAY() only volatile when the date actually changes
3. Only propogate calculations down a chain if the result of an intermediate calc actually changes in value
4. Allow a function to be "Calc Once" volatile, i.e. will be treated as volatile on first open (and hence calc once), but then be non-volatile.
Greg: Medians is something on our list for future consideration. Thanks for the feedback.
Stephen: Thanks for the feedback. I've forwarded your comments to the appropriate feature owner.
TO Doug Jenkins:
Doug, you posted a question on the previous Performance post (comments are closed there now - hopefully you're reading this here).
I'd like to follow up with you on your question, and I need some details. Could you please hit the EMAIL link at the top of the page, and send out the contact form, so that we have your contact?
Thanks!
Danny Khen
Comments: (loading) Collapse