All About File Formats

To this point in the blog, I haven't talked too much about the file formats that Excel 2007 uses, since Brian Jones has been covering the new Office Open XML format pretty well in his blog.  Today I thought I'd give an overview of the three main formats that we have in Excel 2007.

File Format Number 1 - Office Open XML (XLSX, XLSM, XLTX, XLTM, and XLAM files)
The Office Open XML format is a full fidelity (all features of the product are supported) file format for Excel 2007, and it is the default file format that Excel uses to save new files.  These files are composed of several XML parts, all bundled within a zip-compressed file for efficient storage.  Brian has done a pretty good job of covering the Office Open XML format, and one of my posts a while back listed additional resources if you want to know more about the internals of the format.


To recap a bit, here are some of the interesting bits about this format:

  • This format supports all of the functionality of Excel 2007 - that is, every feature in the product can be saved into this XML format - a major improvement from the Spreadsheet XML format that we introduced in Excel 2002.
  • There are two flavors of the format - one flavor (denoted by 'X' at the end of the file extension) that does not support macros (VBA and XLM) contained within the file, and another (with 'M' at the end of the file extension) that does support macros within the file.  This differentiation allows Excel to be more secure - if you open a file with an XLSX extension, Excel won't load the file if it contains macros.  The two flavors also ensure that this is an open XML format - other spreadsheet applications aren't required to understand VBA and XLM (they can use their own macro language and still generate a valid file.)  With the two flavors come a lot of extensions - here's a quick reference for the contents:


  • Because this format is comprised of XML and ZIP technologies, it can be generated or opened by applications other than Excel 2007 on platforms other than Windows.  We are spending a lot of resources making sure that the format is well documented so that it can be approved by ECMA as a standard file format for spreadsheet representation.
  • There will be a converter for this format that users of previous versions of Excel can install so that files saved in this format can be opened and saved in previous versions of Excel (though some information may be lost on that conversion because older versions do not support things like more rows and columns).

File Format Number 2 - Excel Binary (XLSB files)
The Excel binary format is the second full fidelity format for Excel 2007.  It is similar to the Office Open XML format in structure - a set of related parts, in a zip container - except that instead of each part containing XML, each part contains binary data.


Even though we've done a lot of work to make sure that our XML formats open quickly and efficiently, this binary format is still more efficient for Excel to open and save, and can lead to some performance improvements for workbooks that contain a lot of data, or that would require a lot of XML parsing during the Open process.  (In fact, we've found that the new binary format is faster than the old XLS format in many cases.)  Also, there is no macro-free version of this file format – all XLSB files can contain macros (VBA and XLM).  In all other respects, it is functionally equivalent to the XML file format above:

  • File size - file size of both formats is approximately the same, since both formats are saved to disk using zip compression
  • Architecture - both formats use the same packaging structure, and both have the same part-level structures.
  • Feature support - both formats support exactly the same feature set
  • Runtime performance - once loaded into memory, the file format has no effect on application/calculation speed
  • Converters - both formats will have identical converter support

File Format Number 3 - Excel 97-2003 (XLS, XLT, and XLA files)
The last format I'll talk about is the format that everyone is the most familiar with - our binary XLS file format from current versions of Excel.  This format still exists in Excel 2007, and Excel can save many of the new Excel 2007 features into this format (though only Excel 2007 can view these features - they are not available in earlier versions of Excel).  However, while XLS can still be used in a mixed-Excel environment to share files between users of Excel 2007 and earlier Excel versions, XLS is no longer a full fidelity format for Excel 2007.  With many of the increased limits and new functionality we added, it became apparent that this format just couldn't scale to meet our needs - hence the move to the formats discussed above.  When users save spreadsheets to this format, if those spreadsheets contain features not supported by the format, they will be presented with a "Compatibility Checker" dialog, alerting them to the functionality that is lost or degraded when the file is saved into the XLS format.


Using the Compatibility Checker, users can get help on many of the issues or navigate to the location in the spreadsheet where the offending feature is used in order to better understand the impact of saving into the XLS file format.

Working with Excel Services
The last thing I wanted to touch on is our file format support in Excel Services.  Users can publish files to Excel Services using either the XLSX format or the XLSB format (as long as there are no macros) and expect the same level of functionality from their workbooks running on the server - though, as I have mentioned, not all Excel features are supported in the first release of Excel Services.

So what format should I use?
Good question ... our guidance is as follows:

  • For most users, the Office Open XML format is a good default.  If you write code or record a lot of macros, you may want to switch the default to the macro-enabled version (XLSM).
  • When your scenarios will require the improved open/save performance of the binary file format, use XLSB; otherwise the benefits of the XML format often outweigh the (usually unnoticeable) performance benefit of this format.
  • Use XLS when you are sharing files between Excel 2007 and previous versions of Excel, and want to limit the contents of your file to those features that can still be opened (though not seen/edited in all cases) in previous Excel versions, or when you aren't sure if everyone who will be working with the file has the converter.  You won't be able to use all of the new features of Excel, but everyone will be able to open the file.

Hopefully that clears up the various formats that we've got for Excel 2007 - I encourage you to try each of the formats and send any feedback that you have.

Office Blogs Comments

Comments: (16) Collapse

  • Will you be documenting the binary formats?

  • Chris, yes they will be documented.

  • Dave,

    From what I've seen so far, the Compatibility Checker appears to be well implemented. However, I've come across one annoying compatibility issue related to data imported from an external database.  It goes like this: Open an existing .xls file. On a new worksheet, import data into an Excel Table.  Use the Table name in a lookup formula on another sheet. Save the file (still in .xls format). Reopen the file in Excel 2007. The lookup formulas are all blown to sh*t. Open the file in Excel 2003. The formulas are there, the Table is converted to a QueryTable but now the lookup reference is converted to a static range reference.

    The behavior that should occur: After reopening the .xls file in Excel 2007, the lookup formulas should be intact (perhaps this bug is already fixed in the latest refresh). After opening the file in Excel 2003 (or earlier), the original workbook scoped Table name should be converted to a worksheet scoped QueryTable name, so that the lookup formula still refers to a name reference - just as nature intended. I guess that nobody thought that an imported Table or QueryTable would ever to used in lookup formulas, otherwise they would have recognized the folly in the current design.

  • 2 questions:

    1. Will other formats used by MS Excel also be documented? I am thinking of SYLK in particular. I run into compatibility problems importing Excel SYLK files into other programs. Perhaps missing documentation is the cause? From what I have read, Excel's implementation of SYLK is undocumented. I have found no documentation for it on the MS/MSDN web site. The format may be ancient, but these formats will stay relevant, at least for the couple of years it will take non-MS software to support OpenXML.

    2. Why is XLM still in Excel? VBA superseded it in 1993. It seems like unnecessary--and potentially dangerous--baggage. Do customers still use it?

  • Colin, thanks for the great feedback.  Couple of points.  We need to convert Structured refrences (Table1[Column3]) to range references (a2:a77) on save as XLS, because we do not know what version of Excel the XLS file will be opened in, and current versions of Excel would not know what to do with them.  The QueryTable name is something we thought about, but there again things are a bit problematic.  The QueryTable name we generate is the entire QuerTable, where as the default references created when you are using Tables in 2007 include only the data portion of the table (not the header or total rows).

    Francis, we don't currently have plans to document SYLK files, though I will mention your post to the folks that own this area.  With regards to XLM, it is simply a matter of backwards compatibility.  Many many companies today still have solutions that contain some XLM and who have no desire to have to re-develop those solutions.

  • We need to convert Structured refrences (Table1[Column3]) to range references (a2:a77) on save as XLS, because we do not know what version of Excel the XLS file will be opened in, and current versions of Excel would not know what to do with them.

    Very true. For backward compatibility sake, it might have been wise to provide an option to import into a worksheet range (as a QueryTable). As it is, you're forced to choose between a Table and a PivotTable, even if you don't want to import into either.

  • David, If I save a file containing 2007 features in XLS format, I understand that a 2003 user can open the file but will not have the new features available.

    If that user makes a change and re-saves the file, will my 2007 features be preserved?

  • Hi Dave ... the answer, unfortunately, depends on the feature.  Where it is possible (which is many cases), we preserve features.  One of the team is putting together back compat info which we will make available in a blog post soon.

  • When's the deprecated features list going to be available?

  • When's the deprecated features list going to be available?

  • Harlan - I have stuff lined up for the next few weeks, so sometime in the first half of August with any luck.

  • Re: ". Why is XLM still in Excel? "

    I'm surprised to hear it's still in legacy systems, perhaps migrated from Lotus 123.

    But XL4 code is still used TODAY to get access to some obscure functions via defined names. Such as GET.CELL

    See:

    www.jkp-ads.com/.../ExcelNames08.htm

    Patrick O'Beirne

    www.sysmod.com

  • Question regarding the "Working with Excel Services" note:

    Maybe it's because I'm ignorant about Excel Services but why can't macros be used in it?

  • Did this problem get fixed in 2007?

    I found a bug in Excel copy-paste between two sessions.

    groups.google.com/.../ff1eeb8706d15e3d

    Lars

  • Air_Cooled_Nut - The quick answer is that on the server, you really don't want the security or performance concerns that are inherent in allowing VBA to run.  For more details, you can read up on Excel Services in Dave's posts here: blogs.msdn.com/.../11361.aspx

    Lars - No, that same limitation still exists in Excel 2007.

1 2  Next >