Back
Excel

Introduction to the Data Model and Relationships in Excel 2013

 

This blog post is brought to you by Diego Oppenheimer a Program Manager on the Excel team.

 

I am very happy to be writing this blog post today. Not just because I will be showing you another way Excel can make your data analysis easier but also because I will be introducing the new Data Model and Relationships features that will hopefully change the way you use Excel for data analysis forever.

For those of you who are not familiar with the power and usefulness of Pivot Tables you might want to check out this article (Overview of PivotTable and PivotChart reports) or this training (PivotTable I: Get started with PivotTable reports) . Some of these articles are a bit old but the principles and functionality still apply .

Ok, here we go…

Finding a home

Around this time last year my wife and I were considering purchasing a house in the Seattle area, even if it meant dealing with some of the worst traffic in the US. So like any self-respecting Excel nerd I started a spreadsheet with a table of data that fit our parameters. This data was easy to find on the many real-estate sites out there like Zillow.com or Redfin.com. One thing I noticed though was that none of the these sites by themselves had all the relevant data I wanted to make an informed decision and this is where the Data Model came into play by allowing me to combine data from multiple sources and perform a richer analysis.

You can see what I started with below or just download the workbook for yourself.

clip_image002

My first look at the Data Model

If you open the file above you will see I have a table with a lot of data. The first thing I am going to do is create a PivotTable so that I can sift through it easily. Under the INSERT tab, hit PivotTable and the 
following dialog should pop-up:clip_image004

I have highlighted a new option in the create PivotTable dialog which is to “Add this data to the Data Model”. So what is this Data Model I speak of?

A Data Model is a new approach for integrating data from multiple tables, effectively building a relational data source inside the Excel workbook. Within Excel, Data Models are used transparently, providing data used in PivotTables, PivotCharts, and Power View reports“. Read more here…

In other words, the new Data Model allows for building a “model” where data from a lot of different sources can be combined by creating “relationships” between the data sources. For those of you with some database knowledge this is similar to creating joins between tables, except all the tables live in Excel.

I chose to add this data to my Data Model because I am going to be combining it with data I will get from other sources to make my analysis more complete.

Create the PivotTable

The first thing I want to do is look at the number of houses I have selected by zip code. To do this I drag the ZIP field to ROWS and the LISTING ID field to VALUES. By default, this will give me a SUM of LISTING ID’s, but we want a COUNT. To do this, right click on the header that says SUM of LISTING ID -> Value Field Settings… and change to COUNT. As you can see, I only have to sift through 165 house listings now (L).

I have decided to add a couple more fields to my PivotTable to help with my analysis. I added LIST PRICE, DAYS ON MARKET and SQFT and changed the Value Field Settings to AVERAGE.

At this time my PivotTable looked something like this:

clip_image006

And my Field List like this:clip_image007

So now I have a layout that shows me the number of houses that met my criteria per Zip code and some extra data like Average Price, Square footage and Days on Market (my realtor says this comes in real handy when negotiating).

What do I know about the Zip codes?

One thing I noticed about all of the real-estate listing sites is that they give you a ton of detail about the listing but don’t really tell you much about the neighborhood. I want to better understand the demographics of the Zip codes I have selected.

Good thing there is a marketplace for just this type of inquiry (http://datamarket.azure.com). This place is great, and you can read all about it here. Make sure you have a Microsoft/Live-ID or sign up for one free.

Type “Demographics” in the search box and find a data set called “2010 Key US Demographics by ZIP Code, Place and County (Trial)”. If I click on that link to the data set, the first option is to “Explore Data Set” and that’s exactly what I want to do.

Next step is to narrow down the data set to only what is relevant, so I make GeographyType: Zip code and StateAbbreviation “WA” and hit “RUN QUERY”. Great, I think I have now what I need.

clip_image009
To get this into Excel you’re going to need two things from this website, so copy these somewhere:

(1) The URL for the current expressed query

(2) Your private account key

Now that I have all of this figured out, I can easily add what I found to the Data Model. In Excel, go to the DATA tab and select “From Other Sources”, “From Windows Azure Marketplace”.

Fill out the information with what you have saved from the website:

clip_image010

clip_image011
Select that table.

Hit “Finish” and then select “Only Create Connection”:

clip_image012

Note: Some of you might be wondering why I chose “Only Create Connection”. I chose to do so, so that the data was never brought onto the Excel sheet, but directly into the Data Model. I only plan on using it in combination with my original table so there was no need to bring it in as a table onto my sheet or create a new PivotTable or PivotChart. If you’re wondering what that PowerView Report does check out this post by Sean Boon.

Combining my Data

If I go back to my original PivotTable, specifically back to the field list, you might have noticed a small change, particularly around the top.clip_image013

You can see a tab called “Active” (which is selected) and a tab called “All”. The field list is the best way to explore our newly created Data Model, and the “All” tab lets us do just that. Any connection or table that was added to the Data Model will show up in this tab. Let’s take a look.

clip_image014

We can see that our original Table (Table 1) and the Table we brought in from the Data Market are both there. Great!

Now for some magic. I want to know a bit more about each of my Zip Codes, so what I am going to do is add those fields from the demog1 table to my PivotTable. The first one I’ll add is MedianAge2010 (we want to live around people our age so we have a better chance of meeting neighbors we have something in common with).

There are two things to notice here. First, the median age is the same for every zip code, which gives me a hint that something is wrong:

clip_image016

And second, a little message popped up in the field list.

clip_image017

Excel is telling me that a relationship between the two tables might be needed, so I go ahead and create it.

clip_image019

A couple of things to notice when working with relationships:

· Both columns chosen need to contain the same type of data (Zip Code in this case).

· They only work when one of the table’s columns contains unique values.

· The Related Column (Primary) should always be the one containing the unique (no duplicates) values.

· In a PivotTable, to be able to put something from the Related Table on Values you will have to have a field from the Related Table on Rows or Columns.

To satisfy this last condition, I remove ZIP and add GeographyId to Rows. Now my PivotTable looks something like this:

clip_image021

The reason for all those ugly blank spaces is because I have no houses that match my criteria from Table1 for every single Zip Code from the Marketplace data. I can easily filter these out by clicking the downward arrow next to Row Labels, Value Filters, Does Not Equal

clip_image022

Great, now I only have the Zip Codes I care about. I went ahead and added some more fields, conditional formatting and some sorting to figure out what the best zip code for us is and ended up with this:

image

Seems like 98103 is a really good candidate, falling in the middle in price and Average SQFT but towards the bottom in both Unemployment and % Vacant units. I can go ahead and add a COUNT of ListingIDs and see that I have 8 houses to go check out in this neighborhood that meet my criteria.

More Details on the Data Model

In this last example, I really am only scratching the surface of what the Data Model can do and I plan on showing much more in future blog posts. Here are some quick facts though:

  • Any given workbook will only have one Data Model.
  • Any table in Excel can be added to the Data Model.
  • Almost all Data Sources can be added to the Data Model (SQL, Odata, Atom feeds, Excel tables and more).
  • Tables in the Data Model have no limit in terms of rows.
  • Relationships can be defined across multiple tables.

I hope this is enough information to make you dangerous with the new Data Model capabilities. Feel free to leave your questions and comments below or message me on Twitter @doppenhe.

Join the conversation

9 comments
  1. Thought I’d have a poke around…
    Downloaded the attachment
    Extracted
    Opened "FinalDataModelBlogHouseData.xlsx" in Excel 2013
    Clicked on PowerPivot ribbon
    Clicked "Data Model"
    Error: PowerPivot is unable to load the data model.

    • Seemed to cut off the rest of my comment. needless to say this experience wasn’t very encouraging.

  2. Nice! Thanks so much! 3erd way to do join in excel
    1 MS query (the absolute best way)
    2 Power Pivot
    3 now in Pivot Table

    • 2-3 are essentially the same. The data analysis engine xVelocity that powered Powerpivot is now built into Excel.

  3. My apologies in advance for a question not strictly related to the topic of this post but I couldn’t find a solution anywhere else and decided to ask here. I’m trying to use local cube files .cub with a 64-bit version of Microsoft Excel 2010 but I haven’t yet managed to open them. I can easily open them with a 32-bit version. I tried different options to open the file:
    1) Inserted a new pivot table (Insert Menu) >> selected "Get External Data" >> choose ".cub files" >> selected the cube file >> got the error message: "The data source reference is not valid"
    2) Data Menu > From Other Sources > Microsoft Query > Choose Data Source > OLAP Cube tab > New Data Source > name of the data Source = My Cube > select MS OLE DB Provider for Analysis Services 10.0 > Connect > Selected Cube file > browsed to the desired cube file > Finish > the OK button remains grayed out and cannot be selected.
    My system: Windows 7 Pro 64-bit, Office 2010 64-bit

  4. I completely agree with you. I really like this article. It contains a lot of useful information. I can set up my new idea from this post. It gives in depth information. Thanks for this valuable information for all. And of course nice review about the application.
    So informative and interesting post have been shared here. It’s very nice website. I will search this page again & again great list. I appreciate your efforts to bring such a huge list for us.
    pancake mix

  5. I get the: "We couldn’t load the Data Model. This may be because the Data Model in this workbook is damaged." error :-(

  6. Unfortunately the azure datamarket data “2010 Key US Demographics by ZIP Code, Place and County (Trial)” is "not available in your market" – probably meaing "out-side USA"?

Comments are closed.