Back
Excel

Office 365 Search is now smarter about spreadsheets

Share on Facebook Share on Twitter Share on Linkedin Share via OneNote Share via Email Print

Steve Kraynak is program manager for Excel Spreadsheet Management.

Businesses use spreadsheets for many critical functions, such as financial reporting and investment modeling, and they need to minimize the risks that using these spreadsheets introduce. The need for strong spreadsheet management is emphasized in these stories compiled by the European Spreadsheet Risk Interest Group (EuSpRIG). Office responded to this need by introducing new features in Office 2013 to help manage the use of spreadsheets for important business functions, and now it has enabled a similar capability in Office 365.

Now you can manage spreadsheets in Office 365

One of the tools introduced in Office 2013 is Discovery and Risk Assessment, a server (on-premises) that helps you find spreadsheets in your organization and determine which present the greatest risk, so you can take steps to reduce your risks. It does this by crawling network shares and SharePoint sites to find the spreadsheets; then analyzes and rates each one for complexity, materiality (impact to the organization) and risk based on configurable criteria. Businesses are using this powerful tool to gain better visibility of their spreadsheet inventory.

You can now gain better visibility into your spreadsheets in Office 365, too. We integrated the spreadsheet risk assessment concepts directly into the enterprise search capability that’s already crawling your document libraries and cataloging your spreadsheets. This means that you can quickly search for spreadsheets based on properties that help you understand their complexity and business impact. In addition to giving you new insights about your spreadsheets, this capability combines with SharePoint Search and other Office 365 features—like eDiscovery, archiving, and data loss prevention—to give your organization much greater control of your information. Because Office 365 does all the heavy lifting, it’s easier for you to leverage this search capability. Plus, we’re always improving it.

Let’s take a closer look at the new properties of this search capability, where you can use them, and how you can use them effectively to improve visibility into your spreadsheets and help manage risk.

New search properties

In Office 365 you can now search for spreadsheets based on the following properties:

  • The number of unique formula sets. A unique formula set is a range of contiguous cells containing the same formula, taking relative references into account. For example, if cell B1 has the formula =A1, and cell B2 has =A2, these are considered the same, because both formulas refer to the cell one column to the left. As shown in the image below, the five  formulas in column C are counted as one unique formula set, because they’re all multiplying the values in columns A and B.

Office 365 Search 1The formulas in column C count as one “unique formula set.”

  • The number of formulas. This is the total number of cells containing formulas.
  • The number of sheets.
  • The number of data connections.
  • The number of linked workbooks.
  • The path and filename of any/all linked workbooks.

These new search properties can be extremely useful, by giving your organization visibility into its spreadsheet inventory that it has never had before. Here are some examples of the questions these properties enable you to ask about your organization’s spreadsheets:

  • Do we have spreadsheets with links to files on someone’s local computer? For example, your colleague may have an important spreadsheet that uses links to other spreadsheets to import some of the data. If the other spreadsheets are stored on their local hard drive, the link will appear broken when you open it. You can easily find workbooks where this is a problem, because you can search for workbooks based on the location of any files they’re linked to (getting data from).
  • How many spreadsheets have data connections (retrieving data from external sources)? Many people use Excel to analyze data from external data sources, like databases, online services, export files and others. Lots of important work goes on in these spreadsheets, but you probably don’t know how many there are or where they are. Now you can find out—easily.
  • If I change my spreadsheet, which other ones are affected? For example, you have a spreadsheet called “ProductList.xlsx” that’s used as a data source by other spreadsheets.  You need to make a change to the product list spreadsheet, and you need to know what other spreadsheets will be affected by the change. Now you can search for any spreadsheets that are linked to “ProductList.xlsx.”
  • Where are our most important spreadsheets? Many organizations have so many spreadsheets that it’s difficult to know which ones need attention first when it comes to better spreadsheet management practices. Since the chance for errors is greater as spreadsheets grow in complexity, it’s useful to know which spreadsheets are more complex than others. The new search capabilities allow you to get an understanding of the complexity at a high level to help you prioritize.

Where to run searches

There are several places in Office 365 where you can use the new search capabilities. Any search box in SharePoint Online will let you use these properties, as well as any other advanced search terms. We have plans to create a search experience in Office 365 to help guide you and make it easy to use these new properties, but for now one of the best ways is to use the eDiscovery Center in your Office 365 admin center. The eDiscovery center lets you create, save and run search queries, and it has some nice features to help you work with the search results, such as exporting a copy of the files.

Here is an example of what queries look like in an eDiscovery center:

Office 365 Search 2

You can use the eDiscovery Center in Office 365 to search for spreadsheets. 

For more information, see Create and run eDiscovery queries in an eDiscovery Center.

Features of the new search properties

Because the new properties go beyond just searching for words or phrases that appear in a spreadsheet, you need to know how to use them effectively.  For the new properties that count items in a workbook, the exact counts are distilled down to just a few ranges, and it’s important to understand how this works so your searches are effective.

Note: These new features are only enabled for XLSX, XLSM, and XLSB files (Office Open XML spreadsheets), so you’ll want to upgrade any older spreadsheets to one of these file formats.

Search based on an approximate number of items
Several of the new search properties let you search based on an approximate number of items found in the workbook.

They are:

Property name in search schema Description
XLFormulaCountRngID Formulas in the workbook (approximate count)
XLLinkedWkbkCountRngID Linked workbooks (approximate count)
XLWorksheetCountRngID Worksheets (approximate count)
XLDataConnCountRngID Data connections (approximate count)
XLUniqueFormulaSetCountRngID Unique formula Sets (approximate count)

Search based on the path and filename of linked workbooks
One of the new properties lets you search based on the path and filename of linked workbooks:

Property naame in search schema Description
XLLinkedWorkbooksText Path and filename of linked workbooks

Search based on the number of formulas in a spreadsheet
To search based on the number of formulas in a spreadsheet, in your search, specify the property XLFormulaCountRngID. For each document, this property has a value that indicates the low end of the range in which the actual formula count falls. For example, if a workbook has 47 formulas, we set the XLFormulaCountRngID=1, because it’s in the range from 1 to 99. So searching for XLFormulaCountRngID=1 would find all workbooks with at least 1 formula, but no more than 99. To find workbooks with zero formulas, you would search for XLFormulaCountRngID=0.

You should always keep in mind the value will be the low end of the range, so if you want to search for spreadsheets with greater than n formulas, you need to find the correct range ID and search for XLFormulaCountRngID >= that range ID.

For XLFormulaCountRngID, search for the following values to find spreadsheets based on how many formulas they have:

Value Means there are n formulas Typical query syntax
0 zero XLFormulaCountRngID=0
1 at least 1 but less than 100 Use XLFormulaCountRngID>0 or XLFormulaCountRngID>=1
100 at least 100 but less than 500 XLFormulaCountRngID>=100
500 at least 500 but less than 1,000 XLFormulaCountRngID>=500
1000 at least 1,000 but less than 5,000 XLFormulaCountRngID>=1000
5000 at least 5,000 but less than 10,000 XLFormulaCountRngID>=5000
10000 at least 10,000 but less than 100,000 XLFormulaCountRngID>=10000
100000  at least 100,000 XLFormulaCountRngID=100000

Typically, you would search for files with greater than or equal to some number of formulas, such as XLFormulaCountRngID>=500. It’s best to stick with the values shown in the table above. The search index won’t have any values for this property other than the ones shown here. For example, if you have a spreadsheet with 750 formulas, it would have a RngID value of 500, so it would not be found if you searched for >=750. Instead, you would find spreadsheets that have at least 1000 formulas, since that’s the next higher range.

Here are all the properties and their possible values:

Property name Range values
XLDataConnCountRngID 0 1 5 10 15 20 25 30
XLWorksheetCountRngID 0 1 5 10 15 20 25 30
XLLinkedWkbkCountRngID 0 1 5 10 15 20 25 30
XLUniqueFormulaSetCountRngID 0 1 50 100 500 1000 5000 10000 50000
XLFormulaCountRngID 0 1 100 500 1000 5000 10000 50000 100000

Search using the SharePoint 2013 Search Query Tool

Another great option is the SharePoint 2013 Search Query Tool, which is a free tool available on CodePlex.  For step-by-step instructions, see the expanded version of this post.

For more information about spreadsheet management in Office, see this Excel blog post.

Are there other things you’d like to know or see about your spreadsheets? Please let us know by commenting on this blog post.

—Steve Kraynak

Top