Back
Excel

Programmability Improvements in Excel 2010

Thanks to Danny Khen for putting together this post.

What’s new?

Excel has been a platform for developing business solutions for a long time now. Developers write code to customize and extend Excel in many ways – automating Excel operations, user interface customizations, data entry forms and extending Excel formulas with user defined functions, to name some key ones. Many developers create very complex applications, which have demanding requirements and push Excel to its limits, so developers are always looking for more ways to get the job done.

With Office 2010 right around the corner, we thought it would be useful to mention the various ways we’ve improved Excel’s extensibility this time around.

I’ll touch the following in this post:

  • Asynchronous UDFs (user-defined functions)
  • Running UDFs remotely on an HPC (high-performance computing) cluster
  • Migrating Excel 4 Macros to VBA
  • Macro recording for charts and shapes
  • 64-bit support and code compatibility
  • Open XML SDK improvements

Asynchronous UDFs (User-Defined Functions)

Say you have a UDF that takes a long time to complete, but does not actually perform local intensive processor operations. For example, a UDF that brings back a stock quote from a public web service. An Excel calculation thread calls UDFs in a serial fashion, one after the other, waiting for each call to complete before proceeding down the calculation chain. Therefore, if the UDF is called many times on a worksheet, a calculation cycle could needlessly take a very long time – an order of N of the time of a single call completion. Multithreaded recalculation will not do the trick either. It cannot scale very well for this case, because multiple threads have their overhead, and therefore it is not practical to define as many threads as you have calls to the UDF on the sheet.

Excel 2010 enables you to author these non-processor-intensive UDFs as asynchronous. This ability is supported in XLL add-ins, and the new Excel 2010 SDK has all you need for writing asynchronous UDFs. Essentially, you break your UDF in two: a synchronous function call, which sets up the asynchronous calculation, data request, external web service call etc., and returns immediately; and an asynchronous part, which returns the result to Excel when it is ready. Excel tracks pending UDF calls that have not yet completed, and continues independent parts of the calculation. At some point, the UDF call result becomes available to the add-in (this is typically triggered by an event, such as an incoming web service result, an asynchronous I/O operation completion, etc.). The add-in then calls back into Excel with the UDF result.

Sounds complex, but it’s really not that hard (assuming you know your way around XLL development). Let’s say your XLL add-in has a “normalâ€, synchronous UDF, which takes one pointer to an XLOPER12, and returns the same:

LPXLOPER12 WINAPI MyNormalFunc(LPXLOPER12 oper)

In order to turn it into the first (callable) part of an asynchronous UDF, you remove the return value (since the result is going to be returned asynchronously later on), and you add a trailing parameter (we will explain its purpose in a bit):

void WINAPI MyAsyncFunc(LPXLOPER12 oper, LPXLOPER12 asyncHandle)

To register this asynchronous UDF, you designate the return value type (void) with a “>†in the function’s type string – the string where a character code specifies the type of each parameter:

LPXLOPER12 pxTypeText = L”>QX”;

Excel12(xlfRegister, &xResult, 5, &xModuleText, pxProcedure,

pxTypeText, pxFunctionText, pxArgumentText);

The function’s new trailing argument is an XLOPER of type xltypeBigData and is used for an asynchronous call handle. It is invisible on the UDF call in the Excel formula. The Excel model author calls it with a single argument, just like the original synchronous function used to be called. Excel passes a unique handle to the function for each call during a calculation cycle, and uses the handle to track its progress. Your add-in should keep track of each instance of this handle. Later on, when the result of this particular UDF call is ready, your add-in code uses the handle to call back into Excel, using the new callback xlAsyncReturn, in order to provide the result (and after this point, the add-in can discard the handle). In our example, this should look something like the following:

XLOPER12 xlResult;

LPXLOPER12 pxHandle; // make it point to the async handle tracking the call

LPXLOPER12 pxResult; // make it point to your UDF call result

Excel12(xlAsyncReturn, &xlResult, 2, pxHandle, pxResult);

The benefit? A big boost in calculation speed – an order of one calculation time, instead of an order of N. If a single stock quote takes 1 second to come back from a web service, and you have 100 calls (assuming abundant resources on the web service side), this means 1 sec instead of 100 sec – really significant.

The beta version of the new Excel 2010 XLL SDK is available online here. It has all the source files and documentation you need to create asynchronous UDFs.

Running UDFs Remotely on an HPC (High-Performance Computing) Cluster

Many organizations with high computation power demands have high-performance compute clusters. These are large arrays of processors (“nodesâ€), with infrastructure to submit computation jobs to the cluster nodes and collect results. Highly concurrent calculation models can greatly benefit from HPCs.

Say you have a UDF with a very heavy calculation, called many times on some Excel worksheets, with the calls mostly independent from one another. Turning the UDF into asynchronous will not really help calculation speed; all calls still need to calculate on however many processors your computer has, whether synchronously or not. But if you have an HPC cluster at your disposal, and if you could send those function calls off to be calculated on the cluster, it would greatly boost the calculation speed of your sheet. Creating and managing HPC jobs, however, is not an easy task and could require nontrivial coding.

Enter Excel 2010. HPC admins can now deploy XLL add-ins with supported UDFs (a word on that in a bit) to the cluster’s compute nodes. Excel users can configure a compute cluster to be used in calculations, using simple UI in Excel’s options:

image

The user turns on “Allow user-defined XLL functions to run on a compute clusterâ€, selects the type (typically there would be one – the type of HPC cluster that the organization has), and, in the “Options†dialog, sets the host name for the cluster’s head node.

With that in place, Excel automatically sends UDF calls off to the cluster asynchronously, waits for results, and plugs them back into the Excel model – all without the need to modify anything in the Excel model!

Supported UDFs are XLL UDFs, which are registered by the add-in as “cluster-safe,†meaning that they are essentially independent calculations – they are stateless, and do not call back into Excel. A cluster-safe UDF should not perform any operations that rely on the state of the local machine (storing off-process data, open connections, etc.). Once a developer verifies that a UDF is cluster-safe, registering the UDF as such is just a tiny modification to the add-in code – adding another flag to the registration call. There is no need to modify anything in the function code itself to support communication with the HPC cluster or management of the remote call. All there is to do is to add an “&†to the registration’s type string:

LPXLOPER12 pxTypeText = L”QQ&“;

Excel12(xlfRegister, &xResult, 5, &xModuleText, pxProcedure,

pxTypeText, pxFunctionText, pxArgumentText);

Obviously, if your existing UDF is not cluster-safe, you’d need to rewrite it in such a way that it is, in order to register it as cluster-safe and take advantage of automatic call offloading to HPC clusters.

To summarize: Cluster-safe UDFs are offloaded to a high performance compute cluster, without the need to modify the Excel model, and without the need to rewrite the UDFs to use a cluster. Pretty neat!

Look for a dedicated post about this feature on this blog in the near future.

Migrating Excel 4 Macros to VBA

Prior to the introduction of VBA into Excel, we had our own macro facility – known as Excel 4 macros, used in Excel macro sheets. Most people have long since migrated their Excel 4 macros to VBA; however, some Excel 4 macro capabilities were missing from VBA, which made this migration difficult.

In Excel 2010, one of our goals was to remove any remaining barriers that people had to complete the migration of Excel 4 macros to VBA. This work generally fell into two categories:

  1. Adding new objects, methods and properties to Excel’s object model for the few remaining Excel 4 macro functions that did not have equivalents in VBA.
  2. Performance Improvements for Page Setup operations in VBA to provide similar performance to PAGE.SETUP() functionality in Excel 4 Macros.

We have completed this work in Excel 2010 and will be devoting an entire blog post to this subject in the near future.

Macro Recording for Charts and Shapes

When Excel 2007 introduced a new charting engine into the product, the focus was on modernizing chart and shape rendering capabilities. This was a big investment, and we had to make some painful prioritization decisions. This resulted in Excel 2007 losing some compatibility with Excel 2003 in terms of the chart and shape object model, and as a result, macro recording capabilities.

In Excel 2010, macro recording is alive and well again for chart formatting. The vast majority of chart formatting recording capabilities from Excel 2003 are back in, with very few exceptions (axis text formatting being one of them).

We’ve posted several entries in this blog about charting enhancements in Excel 2010.

64-bit Support and Code Compatibility

Office 2010 will ship in both 32- and 64-bit versions. 64-bit is particularly significant to some Excel applications, which hit a wall today in terms of available memory address space. This is an important topic in itself, which we have covered here.

Excel workbooks can be freely opened and edited in both 32- and 64-bit Excel; there is nothing architecture specific in a saved workbook. For custom code solutions, however, 64-bit Excel introduces some challenges:

  • ActiveX controls need to be ported – they need a 64-bit version to work in a 64-bit process. This includes Microsoft’s own controls, of which many have been ported. We are evaluating the popularity and criticality of the remaining ones for possible porting.
  • COM add-ins, similarly, need to be compiled for 64-bit in order to work in 64-bit Excel.
  • XLL add-ins also need to be compiled for 64-bit, and the new Excel 2010 XLL SDK supports that.
  • VBA: Embedded VBA code gets re-compiled when the containing Excel workbook is opened on a new platform; 64-bit Excel now includes a 64-bit version of VBA. So most VBA code just works in 64-bit. However, a subset of VBA solutions needs some tweaking. It has to do with declarations and calls to external APIs with pointers/handles in the parameter list. VBA7, the new version of VBA that ships with Office 2010, supports the development of code that can run in both 32- and 64-bit Office.

To give you a taste of what you may need to modify in VBA code that calls external APIs:

The old 32-bit-only compatible declaration:

Declare Function RegOpenKeyA Lib “advapi32.dll” (ByVal Key As Long, ByVal SubKey As String, NewKey As Long) As Long

Needs to change to this 32- and 64-bit compatible declaration, if you want the code to be used in 64-bit Excel:

Declare PtrSafe Function RegOpenKeyA Lib “advapire32.dll†(ByVal hKey as LongPtr, ByVal lpSubKey As String, phkResult As LongPtr) As Long

MSDN has a detailed article about Compatibility Between the 32-bit and 64-bit Versions of Office 2010.

In addition, we now have a tool that looks for and flags possible points of incompatibility in your code – the Microsoft Office 2010 Code Compatibility Inspector. The tool, among other things, flags 64-bit compatibility issues in your code and hints at the solutions.

Open XML SDK Improvements

Open XML is the default file format of core Office applications starting with Office 2007. The Open XML SDK lets developers create and modify Office files in independent code, without loading the actual Office programs or using their OM. This has significant advantages in many cases, notably for server-based applications.

In April 2009, Office released the first CTP of the Open XML SDK Version 2, which bore important additions over V1 – strongly typed classes and objects; content construction, search, and manipulation; and Office file format schema validation.

In December 2009, leading to the release of Office 2010, we have released the 4th CTP for the Open XML SDK 2.0 for Microsoft Office. There were four major improvements made to the SDK:

  1. Full support for the Office 2010 Open XML formats
  2. Office 2010 schema and semantic level validation
  3. General improvements based on a recent Open XML SDK usability study
  4. Open XML SDK tools improvement

More information is in the linked post.

Resources