Back
Excel

Find all your data connections with Office Discovery and Risk Assessment

This post is brought to you by Steve Kraynak, Program Manager for Excel spreadsheet management features. Special thanks to SQL MVP Scott Stauffer for contributing to this article.

Who’s using my data?

Do you need to find out where your data is being used?  More specifically, how can you find Excel workbooks that have data connections to your databases?

We went to SQL MVP Scott Stauffer to help answer that question.  Scott is helping his client launch a data warehouse that will be the preferred source of data for many areas of the business.  The client is also planning to switch to a new Enterprise Resource Planning (ERP) system, which means that legacy systems will eventually be turned off.  A new data warehouse is great, because it can provide a smooth transition from a legacy to a new ERP system, but people have lots of Excel files with connections to the legacy systems.  Scott and his team need a way to find these spreadsheets and their owners to communicate information about the new data warehouse.

The Discovery and Risk Assessment tool

image 1In Office 2013, Microsoft released a tool called Discovery and Risk Assessment (DRA) which can answer these questions.  DRA gives organizations the ability to create an inventory of their Excel and Access files and then analyze them so they can determine which ones are the most complex, impactful, and risky.  DRA does this by crawling specified network paths and SharePoint sites to find the Excel and Access files, and then it analyzes and gives each file a rating for complexity, materiality (impact to the organization), and risk based on configurable criteria.

What’s interesting to the owner of a database or data warehouse is that Discovery and Risk Assessment can tell you about all the data connections found in the Excel files. We’ll show how you how to use DRA to get the list of all the files with data connections and see the Last Saved By property of each file.  With this list, you’ll be able to search the data connections to find which workbooks are connecting to important data sources.

How Discovery and Risk Assessment helps your business

By using Discovery and Risk Assessment to find out who owns the spreadsheets with connections to the old data sources, Scott helps his client to realize the following direct benefits:

  1. Determine how much effort will be required to update important spreadsheets to the new data warehouse.
  2. Proactively encourage and help spreadsheet owners to switch their data connections from legacy systems to the new data warehouse.
  3. Avoid disruption of switching to the new ERP system.
  4. Maximize the return on investment in the new data warehouse.
  5. Achieve one of their main goals, which is to eliminate ODBC connections to the old system prior to cutover.

In addition to these benefits, Scott’s team will be able to improve the quality of spreadsheet usage throughout the company, which is the prime reason that Office released DRA and several other tools in Office.  They will also:

  1. Find out how the spreadsheets are linked to each other.
  2. Determine whether data is up to date among linked spreadsheets.
  3. Identify users who could benefit from additional training in Excel.
  4. Minimize duplication of effort by getting an inventory of all the workbooks in the organization.
  5. Start to introduce policies and best practices to improve the overall quality of Excel usage.

For more information about Discovery and Risk Assessment (DRA), as well as the other spreadsheet management features available in Office and SharePoint, please see our announcement, the overview of the tools, and a spreadsheet management white paper on TechNet.

How to use DRA in your own business

To use Discovery and Risk Assessment in your own business, first set up DRA through the Volume Licensing Service Center for volume customers, then follow the TechNet instructions for installing and configuring it. When you configure DRA, you’ll run a discovery, which means you’ll have a database containing a list of all the Excel files DRA discovered, as well as a wealth of information about each spreadsheet, plus the list of data connections for each file.

To save you the effort of trying to figure out the database schema, follow these steps to bring the list of workbooks and their data connections into Excel.  This example uses Microsoft Power Query for Excel, but you can still use the traditional ways of getting external data into Excel. (Note: It is recommended that you do not perform the following steps on a live production database, since it could impact performance. It is recommended that you use an offline copy of the database.)

1. Go to the Power Query tab, then follow these steps:

a. Choose From Database -> From SQL Server Database.

b. Enter the server name of your SQL Server, including the instance name, if applicable.

c. Enter the database name of the database that DRA is connected to.  You can find this in the Configuration settings in DRA.

d. Expand the SQL Statement box and copy and paste the query below into it.

image 2

Here’s a detail view of the query in the screenshot.

SELECT
DA.AccountName,
FP.Value AS FilePath,
FN.Value AS FileName,
MD.Value AS Modified,
S4.Value AS LastModifiedBy,
N1.Value AS ConnectionCount,
T1.Value AS DataConnections
FROM
DiscoveryAccounts AS DA
INNER JOIN UDKDefinition AS UDK1 ON DA.AccountID = UDK1.AccountID
INNER JOIN UDKDefinition AS UDK2 ON DA.AccountID = UDK2.AccountID
INNER JOIN UDKDefinition AS UDK3 ON DA.AccountID = UDK3.AccountID
INNER JOIN UDKDefinition AS UDK4 ON DA.AccountID = UDK4.AccountID
INNER JOIN UDKDefinition AS UDK5 ON DA.AccountID = UDK5.AccountID
INNER JOIN UDKDefinition AS UDK6 ON DA.AccountID = UDK6.AccountID
INNER JOIN UDKDefinition AS UDK7 ON DA.AccountID = UDK7.AccountID
INNER JOIN Strings AS FP ON UDK1.UDKID = FP.UDKID
INNER JOIN Strings AS FN ON UDK2.UDKID = FN.UDKID AND FN.InventoryVersionID = FP.InventoryVersionID
INNER JOIN Dates AS MD ON UDK3.UDKID = MD.UDKID AND FN.InventoryVersionID = MD.InventoryVersionID
INNER JOIN Strings AS S4 ON UDK4.UDKID = S4.UDKID AND FN.InventoryVersionID = S4.InventoryVersionID
INNER JOIN Numbers AS N1 ON UDK6.UDKID = N1.UDKID AND FN.InventoryVersionID = N1.InventoryVersionID
INNER JOIN Text AS T1 ON UDK7.UDKID = T1.UDKID AND FN.InventoryVersionID = T1.InventoryVersionID
WHERE
UDK1.NAME LIKE ‘FilePath’
AND UDK2.NAME LIKE ‘FileName’
AND UDK3.NAME LIKE ‘Modified’
AND UDK4.NAME LIKE ‘LastModifiedBy’
AND UDK6.NAME LIKE ‘DataConnectionsCount’
AND UDK7.NAME LIKE ‘DataConnections’
AND N1.VALUE > 0
ORDER BY
FP.Value, FN.Value

2. Depending on the size of your database, in a few minutes the Query Editor will appear and you can simply click Apply and Close to send the data to a worksheet.

3. Now you have the data you need in Excel.  You have the file path, file name, modified date, Last Modified By property from the file, and for each file you have a text string that contains all the data connections for the file.

4. You can filter to suit your needs, but for example, if you need to find all workbooks that have a connection to your legacy ERP system, filter the Data Connections column by rows containing “MyERPSystem” (substitute the real name of your ERP system).

image 3

5. Now you have a list of all the files that have a data connection to your ERP system, and you can see who last modified them, so you know who to contact about switching to the new data warehouse.

image 4

Tell us what you think

Let us know in the comments how you use DRA in your own organization.

–Steve Kraynak, Program Manager, Excel spreadsheet management features