Today’s author, Dan Battagin, a Program Manager on the Excel team, talks about joining two tables in Excel (a.k.a. returning multiple rows for VLOOKUP).
Today, we’ll take a look at the VLOOKUP function, and work out a way to get around its major drawback – it returns only a single value that matches the lookup.
If you’re like me, there are times when you have a big table of data which is pretty well normalized, and you want to pull all of the rows out of that table that match a certain criteria. Well, it’s easy to get the first match in that table by using VLOOKUP (in fact, it’s easier than ever with the new structured references in Excel 2007):
|Figure 1: VLOOKUP is an easy way to return a value out of the first row that contains a customer name.|
As you can see above, with structured references in Excel 2007, not only is VLOOKUP easy to use, but it’s actually pretty readable – we’re matching the value of A9 in the first column of Table1 and returning the value from the 3rd column ($25.00). That’s really nice, and very useful (especially if you use VLOOKUP as part of a calculated column in a table) but it does have the drawback that it doesn’t ever allow us to return any of the values for the second “Dan” in the list. And, if you’re like me, this can be pretty frustrating. I’m constantly working with lists of Excel bugs (yep, we’ve got some bugs), materials lists for woodworking projects that I’m currently involved in, price lists for whatever current home remodel I’m working on, etc. where I really need to see more than just that first row.
So today, let’s look at how we can actually make this work, and retrieve all of the values from a table that match a certain criteria.
Before we get started on the solution, let’s lay out the data that we’re going to use. Since it’s always a pain to come up with data sets for work examples like this, I’m going to use an old standby – the Northwind Sample Database that ships with pretty much every version of Access. Of course, this being Excel, I’m going to copy the portion of the data that I need into worksheets. (Yeah, yeah, I’m a database guy too, and I know we could do the joins in SQL, but that’s not always the case, so bear with me.) OK, so here’s the data I’m going to copy onto sheets in Excel:
So you don’t have to do this yourself, I’ve also made a copy of the spreadsheet available so you can just follow along (see the attachments at the bottom of this post). Of course, if you want to walk through this yourself, go right ahead, that’s a pretty cool compliment.
So now that we’ve got all of our source data, I want to put together the actual Master-Detail form. Nothing too fancy, but basically I want to select the Order Number that I want to look at, and see some details about that order (who ordered it, order status, etc.) as well as all of the items that are part of that order (item name, price, quantity, etc.). Pretty standard stuff. Here’s our target end result:
Figure 5: The master-details form where I can pick an order number and see all of the details.
In the available workbook, you’ll find this form put together already, but if you’re doing it yourself, here are the steps you’ll want to take (note that these steps skip the formatting of this form, since not everyone likes green the way I do):
- Enter the following static strings in the sheet:
- B1: Order Number
- C3: Order Information
- C4: Customer
- F4: Order Date
- F5: Status
- F6: Salesperson
- F7: Ship Date
- C10: Order Details
- C11: Product
- D11: Quantity
- E11: Unit Price
- F11: Discount
- G11: Total Price
- H11: Status ID
- Create a Named Range that includes all of your Order IDs, which we’ll use to create the Order ID drop down using Data Validation. To do this, click on the Formulas tab | Define Name and enter:
- Name: OrderIds
- Refers to: =tblOrders[ID]
Tip: Data Validation cannot refer to ranges on a different sheet than the one that contains the validation, but by using a named range, you can get values from another sheet (and in this case, get a dynamic list of values using the structured reference to get an entire table column, even if new values are added to it in the future)
- Select D1 and name it rngOrderId (we’ll use this later).
- With D1 still selected, click on the Data tab | Data Validation and create a new List type validation with Source: “=OrderIds”. Boom, now you have your drop down.
OK, now we’re going to create some “normal” VLOOKUPS as we discussed above already, with a bit of a twist to make them a bit more robust: we’re going to use the MATCH function instead of a hard-coded column number for the value we want to retrieve from the source table.
- Select C5 and enter the following function:
=VLOOKUP(rngOrderId, tblOrders, MATCH(C4, tblOrders[#Headers], 0), FALSE)
Here we’ve said find the Order ID that I’ve selected in the Orders table, and return the value from the column that MATCHes Customer.
- Just like C5, we’re going to setup the rest of the normal VLOOKUP functions to return metadata about the order – this is all the “Master” data in this Master-Detail form:
- C6: =VLOOKUP($C$5,tblCustomers,MATCH(“Address”,tblCustomers[#Headers],0), FALSE)
- C7: =VLOOKUP($C$5, tblCustomers, MATCH(“City”,tblCustomers[#Headers],0), FALSE) & “, ” & VLOOKUP($C$5, tblCustomers, MATCH(“State”,tblCustomers[#Headers],0), FALSE) & ” ” & VLOOKUP($C$5, tblCustomers, MATCH(“Zip”,tblCustomers[#Headers],0), FALSE)
- H4: =VLOOKUP(rngOrderId, tblOrders, MATCH(F4, tblOrders[#Headers], 0), FALSE)
- H5: =VLOOKUP(rngOrderId, tblOrders, MATCH(F5, tblOrders[#Headers], 0), FALSE)
- H6: =VLOOKUP(rngOrderId, tblOrders, MATCH(F6, tblOrders[#Headers], 0), FALSE)
- H7: =VLOOKUP(rngOrderId, tblOrders, MATCH(F7, tblOrders[#Headers], 0), FALSE)
Figure 6: The completed Master section of the Master-Details form.
Right, so that’s not too bad, and get’s us something that’s pretty robust, even if we add additional data to our source data tables – just as long as we keep the headings in our form matching the column headings in the tables. OK, so now let’s move on to the “Details” part of the form – and this is where we’ll get to find a solution to the VLOOKUP limitation of only returning a single value.
Before we build up the actual solution, let’s talk about some of the elements of the formula we’re going to create:
- We’re not actually going to use VLOOKUP! Since we need to return multiple items, what we really need is a way to return an array (list) of values, and VLOOKUP just doesn’t allow for that. Instead, we’ll use the INDEX function to return the value at a specific row and column intersection.
- We need a way to designate which item in the array we want to show in the cell, and we’ll use a combination of the SMALL and ROW functions to accomplish that.
- We want this to be robust in the same way that we made our VLOOKUP function robust – that is, we want to be sure that if we add additional columns to our source data, that these functions don’t break. Just like with the VLOOKUP solution, we’ll use MATCH to ensure this.
Tip: A nice side effect of what we’re going to do here is that this formula is totally fillable within the “Details” portion of the form, which makes it pretty easy to work with/edit.
- Lastly, we want to have a bit of error handling – specifically, we’ll use IFERROR (a new function in Excel 2007) to ensure that we just show a “blank” cell if there is an error in the calculation.
OK, so without further ado, let’s see that function, as it exists in cell C12 (note that when you enter this function, you enter it without the curly braces, and you press Ctrl+Shift+Enter to commit the cell, which adds the curly braces, making it an array function):
Figure 7: Array function used to return the Nth item that matches a specific value.
That looks pretty complex, so let’s take a look at what’s actually going on here. I’ve pretty printed sections of the formula for easier reading, from the inside out, since that’s how Excel will ultimately calculate it:
MATCH(C$11, tblDetails[#Headers], 0)
Let’s look at the SMALL function first. Small works by taking an array (list) of values, and returns the Nth smallest value from that list.
In this case, the array of values is determined by the IF function. Specifically, IF the Order ID for a row in the Details table equals the Order ID I’ve selected in the drop down, then add the row number of that row (minus the row number of the heading row, in case the table doesn’t start in row 1) to the array of values. And, once the full Details table is analyzed in this way, return the 1st item in the array – ROW(1:1) returns 1.
Now, you’ll see a couple tricks that we’ve used here:
Instead of ROW(1:1) to return the first item in the array, we could have just used the value 1 (or the value 4 if we wanted the 4th item in the array). However, by using the ROW function, Excel will adjust the formula for us as we fill it down a range of cells, so the next row will contain ROW(2:2) for example.
We used structured referencing to make the formula more robust. We could have had cell references for the tblDetails[Order ID] and for the [#Headers], etc. but those would not have adjusted as the source table was modified. By using structured referencing, we have a pretty solid solution here. (and more readable)
OK, let’s move on to the INDEX function next. INDEX works by taking a 2d array (table) and returns the value that is at the specified row and column position in that array.
MATCH(C$11, tblDetails[#Headers], 0)
In this case, we know that the entire SMALL function returns the ROW(Nth) value in the list of Order Details that match the Order ID that I entered – so in this case the 1st value, which is the row number that we want to retrieve a column out of in the Details table. The only thing left then is to specify the column number that we want to retrieve, which as we learned earlier is what MATCH is used for – in this case, taking the value in C11 and finding the column in the Details table that has the same name (Product in this case).
Right, so we see a couple more tricks here:
- To specify the 2d array (table of data), we are again using a structured reference – tblDetails – as opposed to defining the array using a range reference (A1:G50 or whatever). This gives us robustness.
- Ditto with the MATCH function. We could have used the column number explicitly, but by using MATCH, we can much more easily make changes to the source table without having to worry about whether it will break the rest of our model.
- Lastly, you’ll see that with the MATCH function, we’ve made it so that C$11 will adjust across, but will not adjust down – that will come in handy as we want to fill this formula to create our details list.
And with that, all we’re left with is the IFERROR function and the array function designators. IFERROR is pretty simple – it basically says: if an error occurs while calculating the stuff I contain, replace the error value with an empty string (“”). The array function designators (curly braces) are what allow INDEX and SMALL to work over the entire Details table range.
OK, so now that we’ve discussed the formula in C12, let’s quickly finish up the form (and watch the magic of our robust formulas at work). With C12 selected, grab the little fill handle in the bottom right corner of the cell and drag it to H12.
Ooooohhhhhh. Aaaahhhhhh. Notice how that C$11 auto-adjusted to C$12, C$13, etc. as you moved across.
Next, with C12 – H12 selected, grab the fill handle and fill it down 10 rows or so. When you let go, you’ll see, as if by magic, that all of the Order Details appear (as appropriate) and now as you change the Order ID at the top of the sheet, the entire Order Details updates.
Figure 8: The completed details table, with the array formula filled across then down in order to retrieve all details.
Pretty cool huh? We had to create a formula that was a bit more complex than just using VLOOKUP, but we also got a Master-Detail view working in a very robust way, using only built-in Excel formulas (no code, etc.).
Of course, I’m sure there are other, equally slick ways of doing this – if you’ve got one, let’s see it!