Back
Excel

All About File Formats

To this point in the blog, I haven’t talked too much about the file formats that Excel 2007 uses, since Brian Jones has been covering the new Office Open XML format pretty well in his blog.  Today I thought I’d give an overview of the three main formats that we have in Excel 2007.

File Format Number 1 – Office Open XML (XLSX, XLSM, XLTX, XLTM, and XLAM files)
The Office Open XML format is a full fidelity (all features of the product are supported) file format for Excel 2007, and it is the default file format that Excel uses to save new files.  These files are composed of several XML parts, all bundled within a zip-compressed file for efficient storage.  Brian has done a pretty good job of covering the Office Open XML format, and one of my posts a while back listed additional resources if you want to know more about the internals of the format.

To recap a bit, here are some of the interesting bits about this format:

  • This format supports all of the functionality of Excel 2007 – that is, every feature in the product can be saved into this XML format – a major improvement from the Spreadsheet XML format that we introduced in Excel 2002.
  • There are two flavors of the format – one flavor (denoted by ‘X’ at the end of the file extension) that does not support macros (VBA and XLM) contained within the file, and another (with ‘M’ at the end of the file extension) that does support macros within the file.  This differentiation allows Excel to be more secure – if you open a file with an XLSX extension, Excel won’t load the file if it contains macros.  The two flavors also ensure that this is an open XML format – other spreadsheet applications aren’t required to understand VBA and XLM (they can use their own macro language and still generate a valid file.)  With the two flavors come a lot of extensions – here’s a quick reference for the contents:

  • Because this format is comprised of XML and ZIP technologies, it can be generated or opened by applications other than Excel 2007 on platforms other than Windows.  We are spending a lot of resources making sure that the format is well documented so that it can be approved by ECMA as a standard file format for spreadsheet representation.
  • There will be a converter for this format that users of previous versions of Excel can install so that files saved in this format can be opened and saved in previous versions of Excel (though some information may be lost on that conversion because older versions do not support things like more rows and columns).

File Format Number 2 – Excel Binary (XLSB files)
The Excel binary format is the second full fidelity format for Excel 2007.  It is similar to the Office Open XML format in structure – a set of related parts, in a zip container – except that instead of each part containing XML, each part contains binary data.

Even though we’ve done a lot of work to make sure that our XML formats open quickly and efficiently, this binary format is still more efficient for Excel to open and save, and can lead to some performance improvements for workbooks that contain a lot of data, or that would require a lot of XML parsing during the Open process.  (In fact, we’ve found that the new binary format is faster than the old XLS format in many cases.)  Also, there is no macro-free version of this file format – all XLSB files can contain macros (VBA and XLM).  In all other respects, it is functionally equivalent to the XML file format above:

  • File size – file size of both formats is approximately the same, since both formats are saved to disk using zip compression
  • Architecture – both formats use the same packaging structure, and both have the same part-level structures.
  • Feature support – both formats support exactly the same feature set
  • Runtime performance – once loaded into memory, the file format has no effect on application/calculation speed
  • Converters – both formats will have identical converter support

File Format Number 3 – Excel 97-2003 (XLS, XLT, and XLA files)
The last format I’ll talk about is the format that everyone is the most familiar with – our binary XLS file format from current versions of Excel.  This format still exists in Excel 2007, and Excel can save many of the new Excel 2007 features into this format (though only Excel 2007 can view these features – they are not available in earlier versions of Excel).  However, while XLS can still be used in a mixed-Excel environment to share files between users of Excel 2007 and earlier Excel versions, XLS is no longer a full fidelity format for Excel 2007.  With many of the increased limits and new functionality we added, it became apparent that this format just couldn’t scale to meet our needs – hence the move to the formats discussed above.  When users save spreadsheets to this format, if those spreadsheets contain features not supported by the format, they will be presented with a “Compatibility Checker” dialog, alerting them to the functionality that is lost or degraded when the file is saved into the XLS format.

Using the Compatibility Checker, users can get help on many of the issues or navigate to the location in the spreadsheet where the offending feature is used in order to better understand the impact of saving into the XLS file format.

Working with Excel Services
The last thing I wanted to touch on is our file format support in Excel Services.  Users can publish files to Excel Services using either the XLSX format or the XLSB format (as long as there are no macros) and expect the same level of functionality from their workbooks running on the server – though, as I have mentioned, not all Excel features are supported in the first release of Excel Services.

So what format should I use?
Good question … our guidance is as follows:

  • For most users, the Office Open XML format is a good default.  If you write code or record a lot of macros, you may want to switch the default to the macro-enabled version (XLSM).
  • When your scenarios will require the improved open/save performance of the binary file format, use XLSB; otherwise the benefits of the XML format often outweigh the (usually unnoticeable) performance benefit of this format.
  • Use XLS when you are sharing files between Excel 2007 and previous versions of Excel, and want to limit the contents of your file to those features that can still be opened (though not seen/edited in all cases) in previous Excel versions, or when you aren’t sure if everyone who will be working with the file has the converter.  You won’t be able to use all of the new features of Excel, but everyone will be able to open the file.

Hopefully that clears up the various formats that we’ve got for Excel 2007 – I encourage you to try each of the formats and send any feedback that you have.