Programmability Improvements in Excel 2010

Thanks to Danny Khen for putting together this post.

What’s new?

Excel has been a platform for developing business solutions for a long time now. Developers write code to customize and extend Excel in many ways – automating Excel operations, user interface customizations, data entry forms and extending Excel formulas with user defined functions, to name some key ones. Many developers create very complex applications, which have demanding requirements and push Excel to its limits, so developers are always looking for more ways to get the job done.

With Office 2010 right around the corner, we thought it would be useful to mention the various ways we’ve improved Excel’s extensibility this time around.

I’ll touch the following in this post:

  • Asynchronous UDFs (user-defined functions)
  • Running UDFs remotely on an HPC (high-performance computing) cluster
  • Migrating Excel 4 Macros to VBA
  • Macro recording for charts and shapes
  • 64-bit support and code compatibility
  • Open XML SDK improvements

Asynchronous UDFs (User-Defined Functions)

Say you have a UDF that takes a long time to complete, but does not actually perform local intensive processor operations. For example, a UDF that brings back a stock quote from a public web service. An Excel calculation thread calls UDFs in a serial fashion, one after the other, waiting for each call to complete before proceeding down the calculation chain. Therefore, if the UDF is called many times on a worksheet, a calculation cycle could needlessly take a very long time – an order of N of the time of a single call completion. Multithreaded recalculation will not do the trick either. It cannot scale very well for this case, because multiple threads have their overhead, and therefore it is not practical to define as many threads as you have calls to the UDF on the sheet.

Excel 2010 enables you to author these non-processor-intensive UDFs as asynchronous. This ability is supported in XLL add-ins, and the new Excel 2010 SDK has all you need for writing asynchronous UDFs. Essentially, you break your UDF in two: a synchronous function call, which sets up the asynchronous calculation, data request, external web service call etc., and returns immediately; and an asynchronous part, which returns the result to Excel when it is ready. Excel tracks pending UDF calls that have not yet completed, and continues independent parts of the calculation. At some point, the UDF call result becomes available to the add-in (this is typically triggered by an event, such as an incoming web service result, an asynchronous I/O operation completion, etc.). The add-in then calls back into Excel with the UDF result.

Sounds complex, but it’s really not that hard (assuming you know your way around XLL development). Let’s say your XLL add-in has a “normal”, synchronous UDF, which takes one pointer to an XLOPER12, and returns the same:

LPXLOPER12 WINAPI MyNormalFunc(LPXLOPER12 oper)

In order to turn it into the first (callable) part of an asynchronous UDF, you remove the return value (since the result is going to be returned asynchronously later on), and you add a trailing parameter (we will explain its purpose in a bit):

void WINAPI MyAsyncFunc(LPXLOPER12 oper, LPXLOPER12 asyncHandle)

To register this asynchronous UDF, you designate the return value type (void) with a “>” in the function’s type string – the string where a character code specifies the type of each parameter:

...

LPXLOPER12 pxTypeText = L">QX";

...

Excel12(xlfRegister, &xResult, 5, &xModuleText, pxProcedure,

pxTypeText, pxFunctionText, pxArgumentText);

The function’s new trailing argument is an XLOPER of type xltypeBigData and is used for an asynchronous call handle. It is invisible on the UDF call in the Excel formula. The Excel model author calls it with a single argument, just like the original synchronous function used to be called. Excel passes a unique handle to the function for each call during a calculation cycle, and uses the handle to track its progress. Your add-in should keep track of each instance of this handle. Later on, when the result of this particular UDF call is ready, your add-in code uses the handle to call back into Excel, using the new callback xlAsyncReturn, in order to provide the result (and after this point, the add-in can discard the handle). In our example, this should look something like the following:

XLOPER12 xlResult;

LPXLOPER12 pxHandle; // make it point to the async handle tracking the call

LPXLOPER12 pxResult; // make it point to your UDF call result

...

Excel12(xlAsyncReturn, &xlResult, 2, pxHandle, pxResult);

The benefit? A big boost in calculation speed – an order of one calculation time, instead of an order of N. If a single stock quote takes 1 second to come back from a web service, and you have 100 calls (assuming abundant resources on the web service side), this means 1 sec instead of 100 sec – really significant.

The beta version of the new Excel 2010 XLL SDK is available online here. It has all the source files and documentation you need to create asynchronous UDFs.

Running UDFs Remotely on an HPC (High-Performance Computing) Cluster

Many organizations with high computation power demands have high-performance compute clusters. These are large arrays of processors (“nodes”), with infrastructure to submit computation jobs to the cluster nodes and collect results. Highly concurrent calculation models can greatly benefit from HPCs.

Say you have a UDF with a very heavy calculation, called many times on some Excel worksheets, with the calls mostly independent from one another. Turning the UDF into asynchronous will not really help calculation speed; all calls still need to calculate on however many processors your computer has, whether synchronously or not. But if you have an HPC cluster at your disposal, and if you could send those function calls off to be calculated on the cluster, it would greatly boost the calculation speed of your sheet. Creating and managing HPC jobs, however, is not an easy task and could require nontrivial coding.

Enter Excel 2010. HPC admins can now deploy XLL add-ins with supported UDFs (a word on that in a bit) to the cluster’s compute nodes. Excel users can configure a compute cluster to be used in calculations, using simple UI in Excel’s options:

image

The user turns on “Allow user-defined XLL functions to run on a compute cluster”, selects the type (typically there would be one – the type of HPC cluster that the organization has), and, in the “Options” dialog, sets the host name for the cluster’s head node.

With that in place, Excel automatically sends UDF calls off to the cluster asynchronously, waits for results, and plugs them back into the Excel model – all without the need to modify anything in the Excel model!

Supported UDFs are XLL UDFs, which are registered by the add-in as “cluster-safe,” meaning that they are essentially independent calculations – they are stateless, and do not call back into Excel. A cluster-safe UDF should not perform any operations that rely on the state of the local machine (storing off-process data, open connections, etc.). Once a developer verifies that a UDF is cluster-safe, registering the UDF as such is just a tiny modification to the add-in code – adding another flag to the registration call. There is no need to modify anything in the function code itself to support communication with the HPC cluster or management of the remote call. All there is to do is to add an “&” to the registration’s type string:

...

LPXLOPER12 pxTypeText = L"QQ&";

...

Excel12(xlfRegister, &xResult, 5, &xModuleText, pxProcedure,

pxTypeText, pxFunctionText, pxArgumentText);

Obviously, if your existing UDF is not cluster-safe, you’d need to rewrite it in such a way that it is, in order to register it as cluster-safe and take advantage of automatic call offloading to HPC clusters.

To summarize: Cluster-safe UDFs are offloaded to a high performance compute cluster, without the need to modify the Excel model, and without the need to rewrite the UDFs to use a cluster. Pretty neat!

Look for a dedicated post about this feature on this blog in the near future.

Migrating Excel 4 Macros to VBA

Prior to the introduction of VBA into Excel, we had our own macro facility – known as Excel 4 macros, used in Excel macro sheets. Most people have long since migrated their Excel 4 macros to VBA; however, some Excel 4 macro capabilities were missing from VBA, which made this migration difficult.

In Excel 2010, one of our goals was to remove any remaining barriers that people had to complete the migration of Excel 4 macros to VBA. This work generally fell into two categories:

  1. Adding new objects, methods and properties to Excel’s object model for the few remaining Excel 4 macro functions that did not have equivalents in VBA.
  2. Performance Improvements for Page Setup operations in VBA to provide similar performance to PAGE.SETUP() functionality in Excel 4 Macros.

We have completed this work in Excel 2010 and will be devoting an entire blog post to this subject in the near future.

Macro Recording for Charts and Shapes

When Excel 2007 introduced a new charting engine into the product, the focus was on modernizing chart and shape rendering capabilities. This was a big investment, and we had to make some painful prioritization decisions. This resulted in Excel 2007 losing some compatibility with Excel 2003 in terms of the chart and shape object model, and as a result, macro recording capabilities.

In Excel 2010, macro recording is alive and well again for chart formatting. The vast majority of chart formatting recording capabilities from Excel 2003 are back in, with very few exceptions (axis text formatting being one of them).

We’ve posted several entries in this blog about charting enhancements in Excel 2010.

64-bit Support and Code Compatibility

Office 2010 will ship in both 32- and 64-bit versions. 64-bit is particularly significant to some Excel applications, which hit a wall today in terms of available memory address space. This is an important topic in itself, which we have covered here.

Excel workbooks can be freely opened and edited in both 32- and 64-bit Excel; there is nothing architecture specific in a saved workbook. For custom code solutions, however, 64-bit Excel introduces some challenges:

  • ActiveX controls need to be ported – they need a 64-bit version to work in a 64-bit process. This includes Microsoft’s own controls, of which many have been ported. We are evaluating the popularity and criticality of the remaining ones for possible porting.
  • COM add-ins, similarly, need to be compiled for 64-bit in order to work in 64-bit Excel.
  • XLL add-ins also need to be compiled for 64-bit, and the new Excel 2010 XLL SDK supports that.
  • VBA: Embedded VBA code gets re-compiled when the containing Excel workbook is opened on a new platform; 64-bit Excel now includes a 64-bit version of VBA. So most VBA code just works in 64-bit. However, a subset of VBA solutions needs some tweaking. It has to do with declarations and calls to external APIs with pointers/handles in the parameter list. VBA7, the new version of VBA that ships with Office 2010, supports the development of code that can run in both 32- and 64-bit Office.

To give you a taste of what you may need to modify in VBA code that calls external APIs:

The old 32-bit-only compatible declaration:

Declare Function RegOpenKeyA Lib "advapi32.dll" (ByVal Key As Long, ByVal SubKey As String, NewKey As Long) As Long

Needs to change to this 32- and 64-bit compatible declaration, if you want the code to be used in 64-bit Excel:

Declare PtrSafe Function RegOpenKeyA Lib “advapire32.dll” (ByVal hKey as LongPtr, ByVal lpSubKey As String, phkResult As LongPtr) As Long

MSDN has a detailed article about Compatibility Between the 32-bit and 64-bit Versions of Office 2010.

In addition, we now have a tool that looks for and flags possible points of incompatibility in your code – the Microsoft Office 2010 Code Compatibility Inspector. The tool, among other things, flags 64-bit compatibility issues in your code and hints at the solutions.

Open XML SDK Improvements

Open XML is the default file format of core Office applications starting with Office 2007. The Open XML SDK lets developers create and modify Office files in independent code, without loading the actual Office programs or using their OM. This has significant advantages in many cases, notably for server-based applications.

In April 2009, Office released the first CTP of the Open XML SDK Version 2, which bore important additions over V1 – strongly typed classes and objects; content construction, search, and manipulation; and Office file format schema validation.

In December 2009, leading to the release of Office 2010, we have released the 4th CTP for the Open XML SDK 2.0 for Microsoft Office. There were four major improvements made to the SDK:

  1. Full support for the Office 2010 Open XML formats
  2. Office 2010 schema and semantic level validation
  3. General improvements based on a recent Open XML SDK usability study
  4. Open XML SDK tools improvement

More information is in the linked post.

Resources

Office Blogs Comments

Comments: (30) Collapse

  • it's Sounds exciting.

  • Hello Joseph and Danny,

    Thank you for the information.

    We are currently using the Office XP Developer edition to create COM Add-ins, with which we support Excel 2000, 2002/XP, 2003, 2007 and 2010 (32 bit) with one single COM add-in.

    However I don't believe we can compile our program as a 64 bit COM add-in with the Office XP Developer edition, or am I missing something?

    What will be the best approach to upgrade our add-in to a 64 bit COM add-in? For example will there become an update for the existing developer edition or perhaps a new Office Developer edition? Or are we forced to abandon this and use another program (Visual Studio 2010?) instead to create a 64 bit COM add-in?

    Thank you in advance,

    Kind regards,

    Bastien Mensink

  • Interesting to know that msft still supports the migration from XL4/XLM (not XML) to VBA... It's been long since last time I used XLM, and switched all our development to VBA once it was released.

  • With respect to XLM, does this mean you'll be keeping or eliminating the ExecuteExcel4Macro method of the Excel OM's Application class? I ask because VBA provides no other way to fetch values from *CLOSED* workbooks other than the extremely crude kludge of using worksheet cell formulas, e.g.,

    With Range("X99")

     .Formula = "='D:\work\foo\[bar.xls]A'!B3"

     x = .Value2

     .ClearContents

    End With

  • Speaking of Shapes, it seems like you have dropped support for the sheet's Protect method with UserInterfaceOnly = True.

    In xl2003: you could use the Protect method of a sheet with the param UserinterfaceOnly=True. Shapes could then modified without unprotecting/reprotecting the sheet each time.

    In xl2007: part of this Protect functionality was dropped. Some of the Shapes methods required unprotecting/reprotecting the sheet each time. However, the Protect with UserIntefaceOnly would work on some methods.

    In xl2010: after testing a little bit, seems like there is no more support for UserInterfaceOnly on shapes. Am I correct?

    Eg: to just Read Shape.AlternativeText you need to unprotect the sheet.

  • When we call user defined functions from cells, with it display the parameters needed, just like when I call a normal excel function?  This would be EXTREMELY helpful!!!

    Also, will it ever be possible to display an enumeration drop-down when passing function parameters from a cell??  In VBA these lists are displayed as we give the parameters, but they are not available to the worksheet.  Providing users with a list of available parameters would be beyond beneficial!!

  • Harlan, Excel 2010 will continue to support the ExecuteExcel4Macro method.  Our goal longterm is to eliminate the need for it.  For what scenarios do you find yourself fetching values from closed workbooks?  As indicated there will be a follow-up blog post with more details about this work.

    Thanks,

    Eric

  • Joshua: I couldn't agree more... We know of these two requirements and they are on our "things we'd really like to do" list. We'll definitely look into them as part of future investments in Excel as a programmable platform.

    Thanks!

    Danny

  • "Migrating Excel 4 Macros to VBA"

    Will this be 100%.... So for eg will there be an equivalent to EVALUATE

    If yes.. congrats its take you 16 years to complete the migration.

    @Bastien

    The question we should ask is why did MS stop the "Developer" version of Office after XP

    Answer :

    a) More "$". Who would go to VS/VSTO if you could create COM Addins right within Excel

    b) After 2003 MS stopped considering Office as a "Serious" Development platform and went with the theory that give "Control" back to IT and you could push products more easily

  • Bastien:

    Don't have the answer to your question, but I for one have started porting one of my VB6 add-ins to VSTO 2008 to be able to support 64 bit.

  • @Eric -

    Accessing closed workbooks is an infrequent need, but not uncommon. Check your Excel newsgroups to see that this comes up a few times a month.

    When there's a need to access data or formula resorts stored in other workbooks - one workbook from many using the same template, so the filename becomes a variable while the ranges/names in the file are the same for all such files - the filename becomes a variable. The options are as follows.

    1. Use dummy filenames in external references, e.g.,

    ='X:\Y\[Z.xls]SomeWorksheet'!$G$11

    then search and replace to change dummy filenames to actual filenames. Automating this requires macros.

    2. Use INDIRECT; if the actual filename were in a cell named FILENAME, this would look like

    =INDIRECT("'["&FILENAME&"]SomeWorksheet'!$G$11")

    The downside to this is that the file would need to be open. When FILENAME is truly variable, automating this requires macros to open the file when there's an entry in FILENAME.

    3. Use a 3-step formula creation process. Step 1 uses formulas like

    ="='X:\Y\["&FILENAME&"]SomeWorksheet'!$G$11"

    Step 2 converts these formulas to values when there's an entry in FILENAME, producing text constants that look like external reference formulas. Step 3 converts these text constants for formulas by find and replace, finding = and replacing it with = . Automating this requires macros.

    4. Use the old XLODBC add-in to use its SQL.REQUEST function to fetch data from Excel workbooks as ODBC data sources. This handles automation, but requires installing an add-in MSFT seems to have abandoned after Excel 2002.

    5. Use udfs, very, very inefficient udfs, which create separate Excel application instances, then call those other Application instances' ExecuteExcel4Macro with constructed external references. This doesn't require macros per se (i.e., no Sub, just Function), and it doesn't require add-ins, so easily automated AND distributed.

    Lotus 1-2-3's @@ function and Quattro Pro's equivalent, which are both roughly equivalent to Excel's INDIRECT, can (yes, present tense since I still use 1-2-3) fetch values from closed files. Excel's INDIRECT can't. This has bee a real PITA for more than two decades and counting.

  • Harlan never seems to want to mention my technique for variable links, published over 10 years ago.

    POWER FORMULA TECHNIQUE

    Created by Shane Devenshire and David Hager

    The problem - to change a link in a formula without changing the

    formula. This can be done with the INDIRECT function by creating a

    concatenated string with input from several worksheet cells which

    contain workbook (in A1) and worksheet (in B1) names.

    =INDIRECT("'["&A1&"]"&B1&"'!A1")

    Unfortunately, this type of formula will only work if the referenced

    workbook is open. Shane came up with part of the solution to this

    problem when he discovered that the INDEX function can return a linked

    cell value from a hard-coded link range. For example, if you define a

    range as "ref1", where the linked range formula is:

    =[Book1.xls]Sheet1!$1:$65536

    (A smaller range starting at A1 can also by used.)

    then you can use the formula:

    =IF(ISERR(INDEX(ref1,ROW(),COLUMN())),"",INDEX(ref1,ROW(),COLUMN()))

    in any cell and the returned value will be from the same cell in Book1.xls

    on Sheet1. Then, variable links to this formula can be made by changing the

    link range as referred to in a named formula. This formula is of the form:

    =CHOOSE(Sheet2!$A$1-29*INT((Sheet2!$A$1-1)/29),ref1,ref2,...,ref29)

    where Sheet2!$A$1 is an input cell for values from 1 to n which represent

    a particular link stored as a defined name. As you are probably aware, the

    CHOOSE can only accept 29 arguments. However, there is a workaround for

    this limitation, and the formula in the 1st argument is part of that

    process. It converts the value in Sheet2!$A$1 into a number between 1 and

    29. Then, if you define the preceding formula as oref1 (and other similar

    formulas as oref(n)), you can use the following master formula:

    =CHOOSE(INT((Sheet2!$A$1-1)/29)+1,oref1,oref2,...,oref29)

    Now, if you give this formula a defined name (say mref), then the resulting

    "omnireference" can be used in place of ref1 in Shane's formula to produce

    an "omnilink" that is capable of returning values from 29 x 29 (841)

    different links. This formula is the one that is finally entered in a

    worksheet cell.

    =mref

    Note: This technique works great as long as the linked files are not moved,

    renamed or deleted.

  • Bastien:

    You are right, VB6 does not support compiling to 64 bit (either within Office XP Developer or in the separate VS product with VB6).

    I think that the best approach, in terms of longevity of your code, would be to migrate it to a .NET language (VB.NET, C#), as an add-in developed in VS 2008/2010. If you have performance-sensitive parts, e.g. some functions that are called repeatedly from the Excel sheet, you may want to consider implementing them in an XLL add-in (with the new XLL SDK for 2010, you can comile to both 32- and 64-bit).

    Danny

  • @David -

    I've seen and used CHOOSE to handle multiple *STATIC* external references. I'm going to continue to DISMISS this technique because it doesn't handle DYNAMIC references. I'm going to focus on techniques to reference files that OTHER users create (from a given template) AFTER I create my file which references files of that template.

    I don't know when I write MY formulas what the drive, directory or base filename would be. I only know worksheet names/ranges or defined names. I need to handle any file pathname. Even if I only had to worry about 8-char filenames where those 8 chars could only be alphanumeric, that'd be 36^8 = 2,821,109,907,456 possible filenames (and that doesn't even allow for 7- or fewer char filenames). Got a CHOOSE formula to handle 3 trillion possible filenames?

    Your approach doesn't do the same thing. Your approach is useful when the possible filenames are known in advance. It's useless when the filenames aren't. I concentrate on the latter case.

  • Will it be possible to run Excel calculations on GPUs in conjunction with CUDA?

1 2  Next >

Comments: (loading) Collapse