You can use your favorite social network to register or link an existing account:
Or use your email address to register without a social network:
Sign in with these social networks:
Or enter your username and password
Forgot your password?
Yes, please link my existing account with for quick, secure access.
No, I would like to create a new account with my profile information.
One of the things I mentioned in my overview of Excel 12 post in September was that we had done some work to speed up calculation on modern hardware (multi-processor or dual-core chips). I thought I would take a brief break from tables to summarize that work and to see if any readers are interested in beta testing this feature.
Multi-Threaded Calculation
To a large number of customers, Excel’s calculation speed is extremely important – perhaps the most important “feature” we ship. When planning Excel 12, we started a small investigation to look at different ways we could make Excel calculate faster on computers that had multi-processor or dual-core chips. The investigation turned out to be promising, so we continued the work, and the result is a very exciting feature that we refer to as multi-threaded calculation, or MTC. (Note – this is another working name, not a final name. Also note that our developers refer to this as MTR, or multi-threaded recalculation, but since most customers use the word calculation, I have decided to go that route for this post.)
In a nutshell, this feature enables Excel to spot formulas that can be calculated concurrently, and then run those formulas on multiple processors simultaneously. The net effect is that a given spreadsheet finishes calculating in less time, improving Excel’s overall calculation performance. Excel 12 can take advantage of as many processors (or cores, which to Excel appear as processors) as there are on a machine - when Excel loads a workbook, it asks the operating system how many processors are available, and it creates a thread for each processor. In general, the more processors, the better the performance improvement. Unsurprisingly, the design of the spreadsheet will have a direct impact on the size of any performance increase. At one end of the spectrum, a spreadsheet that has a lot of completely independent calculations (like a Monte Carlo simulation) should see enormous benefit. At the other end of the spectrum, a spreadsheet that has one completely linear set of calculations (where each formula depends on the previous formula, in linear fashion) will not see significant performance increases. The majority of the spreadsheets we see are in between these two ends of the spectrum, so we think that most everyone will see some amount of benefit from this work. Additionally, people who care about performance can tweak their spreadsheets to take advantage of this capability.
We think this is really exciting work – we are still profiling and testing performance and making tweaks, but based on what we have seen so far, things like Monte Carlo simulations (which are almost ideal test cases) see almost perfectly linear speedup - that is, using 2 threads on a dual-processor or dual-core machine cuts calc time nearly in half – and the majority of other workbooks also see noticeable reduction in calculation time. This is especially exciting because of the increasing availability of dual-core chips in standard business desktop machines … simply installing Excel 12 on these machines should lead to calculation performance improvements for users.
I can hear developers asking “what about user-defined functions (UDFs)?” The answer is that in Excel 12, we have enabled functions defined in XLLs to participate in multi-threaded calculation; VBA and automation add-in UDFs will not be multi-threaded (meaning, for example, that a VBA UDF and everything that depends on it cannot be processed simultaneously with other formulas). We have updated the XLL interface to allow developers to advertise their XLL functions as thread-safe (and to reference the big grid and a few other things to be covered in a later post about XLLs).
A few other things about this feature:
Regarding that last point - you may be asking yourself why you would want to specify more threads than the number of processors - the reason we enabled running multiple threads per processor was to help out our customers who run workbooks that make a lot of external calls using XLL user-defined functions (UDFs). In certain industries, we see lots of customers that have slow-calculating spreadsheets where the bulk of the calculation time is spent outside of Excel in UDFs … the UDFs often run on other machines. For example, a customer may have a spreadsheet that makes 100 calls to a number of XLL UDFs on a server. The UDFs perform data retrieval or intensive calculations. Once all 100 UDFs have returned results to Excel, the spreadsheet then does some analysis on the results of those calls. Given that Excel 2003 is single-threaded, if each call to an XLL UDF takes 30 seconds, the model takes almost an hour to run, even on a dual-processor machine. If they ran the same spreadsheet on Excel 12, by default, the calculation time would be cut roughly in half, since Excel 12 would make 2 XLL UDF calls at a time – one for each processor. If the user then manually set the number of threads to 8, Excel 12 could make 8 XLL UDF calls simultaneously, meaning the calc time would fall from close to an hour to a matter of minutes, assuming the server can return the data at the same rate in either case.
Would you like to help?
The Excel team has reserved a number of positions in our upcoming technical beta for individuals that have the interest, time, hardware, and spreadsheets to help test the work we have done in this area (and all the other work we have done, although we are looking for a few people that are particularly interested in calculation performance). If you are interested in helping test this feature, please use the “Email” link above to contact me and answer the questions below. Before you do so, however, please take a look at a few caveats:
If you have reviewed all that and are interested, please send me an email with the following:
Again, I can’t promise a spot to everyone that replies, so please don’t be disappointed if we don’t have space. And thanks in advance to anyone that does contact me.
Next time, more on tables, I promise.
Comments: (10) Collapse
On single processor/single core machines would XL12 run faster with threading disabled since that'd presumably eliminate the new dependency checking (on initial recalc) and unavoidable thread overhead (on all recalcs)? If so, will XL12 installation enable or disable threading based on how many processors/cores are available during installation?
If a user had, say, two workbooks open at the same time with no interworkbook references between them, the formulas in each would be independent. Would XL12 assign separate recalculation threads to each workbook so the user could continue working on workbook A while workbook B continues to recalculate? That'd be useful on single processor/single core machines.
Getting back to dependency checking, will XL12 store metadata on formula dependencies in the new XML workbooks? If workbook X had last been recalculated and save on a dual core machine with threading enabled, then opened and saved on a machine with threading disabled (no matter how many processors/cores it might have), would the metadata be lost? How will this affect shared workbooks?
Will OFFSET, INDIRECT and INDEX(..):INDEX(..) defeat dependency checking, or will it work similar to circular reference detection?
Harlan - I would imagine that even single processor systems could gain benefit if there were some other external dependancy (e.g. the hard disk, the network) that was slower.
Also I'd wait on the requesting the exact methods for tuning since it's the kind of thing most likely to shift.
Interesting stuff.
Can you comment more on how Excel handles forumula evaluation? At one point I thought I had heard it actually compiles worksheet formulas to machine code...
I guess you're leveraging the code that checks for circular references to find all independent strands?
And if there'd be test spreadsheets being supplied, I could make them run on my old trusty dual P3-933 running Windows 2003. I have no idea if I'm nominated though; until now I've just received an invite and signed up.
Sounds really good.
Whilst the subject is the calculation engine:
Have you added any user control of the granularity of the dependency tree? (pre excel 2002 worksheet level, 2002/3 global level). Multi-threaded recalculation would seem to naturally fit with a more granular approach, and whilst most workbooks calc faster with the global dependency tree there are some that calc slower.
Any exploitation of 64-bit processors?
Any improvements to Range.calculate (it got slowed down in XL 2002 when it was changed to handle within-range dependencies)?
have you added a workbook.Calculate method to complement range.calculate sheet.calculate and all_open_workbooks.calculate?
Hi folks,
Harlan – we will be smart and on single core/processor machines we will not do any of the work required to multi-thread, so users will not see a performance impact and they will not need to disable threading. The scenario you outline about working on workbook A while workbook B is calcing will not be supported in 12 – that would take more work than just the calculation engine. We are still finalizing designs around handling dependency information. With respect to OFFSET etc., no, it will not defeat dependency checking.
mschaef – what specifically are you interested in hearing about?
Mario – shoot me an email and we can discuss further.
Charles – we have not added any control over the granularity of the tree, though we do hear some requests. No specific exploitation of 64-bit processors right now. We have not changed range.calculate nor added a workbook.calculate method. I would be interested to hear the sorts of scenarios that give rise to these questions if you can share.
David,
I think the Excel team have been absolutely correct to pursue two approaches to improving calculation speed: generic improvements for all users and controllability improvements for specialist users (typically traders & users of large financial /forecasting /engineering /decision support models).
Scenarios:
Range.calculate without dependencies. For cases where you need the fastest possible recalculation and can control the design of the critical block of formulae Range.calculate is still often the fastest way, but the introduction of the within-range dependency calc has slowed it down, often significantly.
Workbook.calculate: when you have a large number of workbooks with some volatility open (which happens more often as Excel's memory capacity increases and the price of RAM falls) you often want to recalc only the active workbook, or in more complex cases control the calculation sequence of the workbooks. Currently you can program this using a somewhat complex scheme with worksheet .enablecalculation, but this method has several drawbacks. Many users are surprised to find that F9/Automatic recalculates all open workbooks rather than just the active book. So it would be good to have an additional calculation option (Active workbook only) and the ability to calculate a specific workbook (workbook.calculate)
Tree granularity; Its difficult to give a specific scenario, but when you have a complex slow-calculating workbook you need all the help you can get: for some workbooks the old method is faster than the new method and you could tweak calculation speed by moving stuff between sheets. Probably also need workbook-level granularity for workbook.calculate.
64-bit: I guess the questions are: what performance improvement is possible and when will there be a sufficient number of 64-bit processors in the marketplace?
One could reasonably expect that Automatic Recalc only recalculates the active workbook and those workbooks in RAM that have dependencies on the active workbook.
For MTC : does that mean that DataTables could be executed in parallel ? That would be great. We rely on them a lot, as well as on array formulas, VBA UDFs (slow, too), and any combination of the above.
Not a high powered financial analyst, but I the quicker my pest forecast models run the better. 64bit Windows is here, but no use if the software doesn't run take advantage of native 64 bit mode. Combination of 64 bit and hyperthreading with an Athlon X2 is enticing. Would like to see speed tests with Excel 12 set to use these abilities. Plenty of 64 processors sold in last 2 years to justify supporint 64-bit.
According to article below, upgrading to Athlon X2 should be straight cpu swap
from: hardware.gamespot.com/Story-ST-17478-1945-x-x-x
If you already own an Athlon 64, AMD made
the upgrade path to the Athlon X2 very easy. Most existing socket 939 motherboards that support the Athlon 64 should support the Athlon X2. AMD claims that the vast majority of motherboards will be able to run Athlon X2s with nothing more than a BIOS update. Be sure to check your power supply though, as AMD recommends the use of a 550W power supply for use with the Athlon X2.
Thanks everyone for the feedback.
Wim - yes, Data Tables and array formulas will benefit from multiple procs/cores. VBA UDFs will not (as explained above).
Comments: (loading) Collapse