You can use your favorite social network to register or link an existing account:
Or use your email address to register without a social network:
Sign in with these social networks:
Or enter your username and password
Forgot your password?
Yes, please link my existing account with for quick, secure access.
No, I would like to create a new account with my profile information.
This blog post is brought to you by Diego Oppenheimer a Program Manager on the Excel team.
I am very happy to be writing this blog post today. Not just because I will be showing you another way Excel can make your data analysis easier but also because I will be introducing the new Data Model and Relationships features that will hopefully change the way you use Excel for data analysis forever.
For those of you who are not familiar with the power and usefulness of Pivot Tables you might want to check out this article (Overview of PivotTable and PivotChart reports) or this training (PivotTable I: Get started with PivotTable reports) . Some of these articles are a bit old but the principles and functionality still apply .
Ok, here we go…
Around this time last year my wife and I were considering purchasing a house in the Seattle area, even if it meant dealing with some of the worst traffic in the US. So like any self-respecting Excel nerd I started a spreadsheet with a table of data that fit our parameters. This data was easy to find on the many real-estate sites out there like Zillow.com or Redfin.com. One thing I noticed though was that none of the these sites by themselves had all the relevant data I wanted to make an informed decision and this is where the Data Model came into play by allowing me to combine data from multiple sources and perform a richer analysis.
You can see what I started with below or just download the workbook for yourself.
If you open the file above you will see I have a table with a lot of data. The first thing I am going to do is create a PivotTable so that I can sift through it easily. Under the INSERT tab, hit PivotTable and the following dialog should pop-up:
I have highlighted a new option in the create PivotTable dialog which is to “Add this data to the Data Model”. So what is this Data Model I speak of?
“A Data Model is a new approach for integrating data from multiple tables, effectively building a relational data source inside the Excel workbook. Within Excel, Data Models are used transparently, providing data used in PivotTables, PivotCharts, and Power View reports“. Read more here…
In other words, the new Data Model allows for building a “model” where data from a lot of different sources can be combined by creating “relationships” between the data sources. For those of you with some database knowledge this is similar to creating joins between tables, except all the tables live in Excel.
I chose to add this data to my Data Model because I am going to be combining it with data I will get from other sources to make my analysis more complete.
The first thing I want to do is look at the number of houses I have selected by zip code. To do this I drag the ZIP field to ROWS and the LISTING ID field to VALUES. By default, this will give me a SUM of LISTING ID’s, but we want a COUNT. To do this, right click on the header that says SUM of LISTING ID -> Value Field Settings… and change to COUNT. As you can see, I only have to sift through 165 house listings now (L).
I have decided to add a couple more fields to my PivotTable to help with my analysis. I added LIST PRICE, DAYS ON MARKET and SQFT and changed the Value Field Settings to AVERAGE.
At this time my PivotTable looked something like this:
And my Field List like this:
So now I have a layout that shows me the number of houses that met my criteria per Zip code and some extra data like Average Price, Square footage and Days on Market (my realtor says this comes in real handy when negotiating).
One thing I noticed about all of the real-estate listing sites is that they give you a ton of detail about the listing but don’t really tell you much about the neighborhood. I want to better understand the demographics of the Zip codes I have selected.
Good thing there is a marketplace for just this type of inquiry (http://datamarket.azure.com). This place is great, and you can read all about it here. Make sure you have a Microsoft/Live-ID or sign up for one free.
Type “Demographics” in the search box and find a data set called “2010 Key US Demographics by ZIP Code, Place and County (Trial)”. If I click on that link to the data set, the first option is to “Explore Data Set” and that’s exactly what I want to do.
Next step is to narrow down the data set to only what is relevant, so I make GeographyType: Zip code and StateAbbreviation “WA” and hit “RUN QUERY”. Great, I think I have now what I need.
To get this into Excel you’re going to need two things from this website, so copy these somewhere:
(1) The URL for the current expressed query
(2) Your private account key
Now that I have all of this figured out, I can easily add what I found to the Data Model. In Excel, go to the DATA tab and select “From Other Sources”, “From Windows Azure Marketplace”.
Fill out the information with what you have saved from the website:
Select that table.
Hit “Finish” and then select “Only Create Connection”:
Note: Some of you might be wondering why I chose “Only Create Connection”. I chose to do so, so that the data was never brought onto the Excel sheet, but directly into the Data Model. I only plan on using it in combination with my original table so there was no need to bring it in as a table onto my sheet or create a new PivotTable or PivotChart. If you’re wondering what that PowerView Report does check out this post by Sean Boon.
If I go back to my original PivotTable, specifically back to the field list, you might have noticed a small change, particularly around the top.
You can see a tab called “Active” (which is selected) and a tab called “All”. The field list is the best way to explore our newly created Data Model, and the “All” tab lets us do just that. Any connection or table that was added to the Data Model will show up in this tab. Let’s take a look.
We can see that our original Table (Table 1) and the Table we brought in from the Data Market are both there. Great!
Now for some magic. I want to know a bit more about each of my Zip Codes, so what I am going to do is add those fields from the demog1 table to my PivotTable. The first one I’ll add is MedianAge2010 (we want to live around people our age so we have a better chance of meeting neighbors we have something in common with).
There are two things to notice here. First, the median age is the same for every zip code, which gives me a hint that something is wrong:
And second, a little message popped up in the field list.
Excel is telling me that a relationship between the two tables might be needed, so I go ahead and create it.
A couple of things to notice when working with relationships:
· Both columns chosen need to contain the same type of data (Zip Code in this case).
· They only work when one of the table’s columns contains unique values.
· The Related Column (Primary) should always be the one containing the unique (no duplicates) values.
· In a PivotTable, to be able to put something from the Related Table on Values you will have to have a field from the Related Table on Rows or Columns.
To satisfy this last condition, I remove ZIP and add GeographyId to Rows. Now my PivotTable looks something like this:
The reason for all those ugly blank spaces is because I have no houses that match my criteria from Table1 for every single Zip Code from the Marketplace data. I can easily filter these out by clicking the downward arrow next to Row Labels, Value Filters, Does Not Equal
Great, now I only have the Zip Codes I care about. I went ahead and added some more fields, conditional formatting and some sorting to figure out what the best zip code for us is and ended up with this:
Seems like 98103 is a really good candidate, falling in the middle in price and Average SQFT but towards the bottom in both Unemployment and % Vacant units. I can go ahead and add a COUNT of ListingIDs and see that I have 8 houses to go check out in this neighborhood that meet my criteria.
In this last example, I really am only scratching the surface of what the Data Model can do and I plan on showing much more in future blog posts. Here are some quick facts though:
I hope this is enough information to make you dangerous with the new Data Model capabilities. Feel free to leave your questions and comments below or message me on Twitter @doppenhe.
Thought I'd have a poke around...
Downloaded the attachment
Opened "FinalDataModelBlogHouseData.xlsx" in Excel 2013
Clicked on PowerPivot ribbon
Clicked "Data Model"
Error: PowerPivot is unable to load the data model.
Seemed to cut off the rest of my comment. needless to say this experience wasn't very encouraging.
kick ass. I add tried to toy with the data model on my own but couldn't get it to work. What I was missing was that little part:
"In a PivotTable, to be able to put something from the Related Table on Values you will have to have a field from the Related Table on Rows or Columns. To satisfy this last condition, I remove ZIP and add GeographyId to Rows."
That opens up quite a few possibilities! It just sucks that the Data Market offering for Canada is... well... limited to say the least.
Not sure what your seeing Jamie but you don't need to click on the Powerpivot ribbon for the Data Model to load. Can you try adding or removing one of the fields from the PivotTable? If that doesnt work feel free to ping me at diego at microsoft dot com.
Nice! Thanks so much! 3erd way to do join in excel
1 MS query (the absolute best way)
2 Power Pivot
3 now in Pivot Table
2-3 are essentially the same. The data analysis engine xVelocity that powered Powerpivot is now built into Excel.
My apologies in advance for a question not strictly related to the topic of this post but I couldn’t find a solution anywhere else and decided to ask here. I’m trying to use local cube files .cub with a 64-bit version of Microsoft Excel 2010 but I haven’t yet managed to open them. I can easily open them with a 32-bit version. I tried different options to open the file:
1) Inserted a new pivot table (Insert Menu) >> selected "Get External Data" >> choose ".cub files" >> selected the cube file >> got the error message: "The data source reference is not valid"
2) Data Menu > From Other Sources > Microsoft Query > Choose Data Source > OLAP Cube tab > New Data Source > name of the data Source = My Cube > select MS OLE DB Provider for Analysis Services 10.0 > Connect > Selected Cube file > browsed to the desired cube file > Finish > the OK button remains grayed out and cannot be selected.
My system: Windows 7 Pro 64-bit, Office 2010 64-bit