Use SUMPRODUCT to find the last item in an Excel list

image Today’s author is Charley Kyd, an Excel MVP, who describes how to use the SUMPRODUCT function to find the last sequential item in an Excel list.

Charley has worked with spreadsheets in business since 1979, and has written books on spreadsheet use for McGraw-Hill and Microsoft Press. You can find more useful tips and posts from Charley and other Excel MVPs at his website, ExcelUser.com.

clip_image002

Suppose you have a Sequential List of items, as in the second table here. And suppose you want to show the date of the first and last entry in the list. How would you do it?

This question isn't limited to dates, by the way. The Sequential List could display numbers or text instead. How would you write the formulas used in the range B4:C6 to return the dates of the first and last items?

Many Excel users could quickly write a formula that returns the first date for each item in the list. They would use an INDEX-MATCH formula like this for the cell shown:

B4:  =INDEX(Date,MATCH(A4,Item,0))

However, INDEX-MATCH won't help us to find the last occurrence of each item in the list. This is because MATCH (with a match-type equal to zero) returns the first item in an unsorted list. There's no version that returns the last item.

However, as the title of this short article suggests, Excel's SUMPRODUCT function provides just the power we need. Here's the formula for the cell shown:

C4:  =INDEX(B:B,SUMPRODUCT(MAX((Item=A4)*ROW(Item))))

Even if you use SUMPRODUCT a lot, you might find it difficult to understand this formula at first glance. So let's take it one step at a time...

clip_image003

SUMPRODUCT works with arrays of data. But it's not limited to mere multiplication, as the "PRODUCT" part of the name implies. In fact, here's how I tend to think of SUMPRODUCT:

SUMPRODUCT = (do stuff as though the formula were entered as an array, and then return the results)

I know, that's not a very scientific explanation, but it's not a bad rule of thumb.

Looking inside the SUMPRODUCT formula above, let's start with: (Item=A4)

Here, the function compares the value of each cell in the Item range (cells A10:A19) to the text in cell A4 ("Hats"). In its memory, Excel sets up an array with TRUE where an Item equals "Hats" and FALSE where it doesn't.

Now consider: (Item=A4)*ROW(Item)

Here, I've told the function to multiply the row number for each item by the corresponding value in the TRUE/FALSE array.

Because TRUE evaluates as 1 in a formula and FALSE as 0, this product returns an array of numbers. Most of the numbers equal zero, where the Item doesn't equal "Hats." But where an Item does equal "Hats," the array contains the row number where the item is found.

The next step: MAX((Item=A4)*ROW(Item))

This is simple. MAX returns the value for the largest row number in the array of numbers and zeros, which is where "Hats" occurs last in the list.

And the next step: SUMPRODUCT(MAX((Item=A4)*ROW(Item)))

SUMPRODUCT is merely the function that tells Excel to handle the earlier calculations as arrays. We use this section of the formula to return one simple number: The number of the row in your worksheet where "Hats" appears last.

Here's the whole formula again:

C4:  =INDEX(B:B,SUMPRODUCT(MAX((Item=A4)*ROW(Item))))

clip_image004

This formula now should start to make more sense. There's only one step left to go:

INDEX(B:B, [the last row number for "Hats"])

Here, INDEX is looking at all of column B in the spreadsheet. Among other information, column B includes the date information in the Sequential List. INDEX returns the date found in the row number specified by the SUMPRODUCT function. And that row number is where the last occurrence of "Hats" can be found.

Finally, you might consider a variation of the formula, a variation that eliminates one risky aspect of this approach: The formula shown above for cell C4 references an entire column.

Referencing an entire column is a problem in this case because if you cut and paste the Sequential List to some other location, the formula still will reference column B, but the list won't be in it. And the formula will fail.

A modified formula corrects this problem:

C4:  =INDEX(Date,SUMPRODUCT(MAX((Item=A4)*ROW(Item)))
          -ROW(Date)+1)

Rather than referencing column B, this formula references the range named Date. To get the index value we need for this range, we subtract the row number of its first row, and then add 1. (Because the Item and Date ranges always must be in the same rows, we also could have used the first row number of the Item range.)

Enter the formula in cell C4 and then copy the range B4:C4 down two rows. When you do so, the new formulas will return the first and last occurrences of Shoes and Socks, respectively.

Office Blogs Comments

Comments: (11) Collapse

  • An alternative solution would be using array functions.

    B4:  {=MIN(IF(Item=A4,Date,1000000))}

    C4: {=MAX(IF(Item=A4,Date,0))}

    0 in the second formula is there because it is the sequential number of a date which is smaller than any date in column Date.

    1000000 in the first formula is there because it is the sequential number of a date which is larger than any date in column Date.

  • "There's no version that returns the last item."

    Thats not entirely true.

    If your data is sorted on Items in Ascending order then Match(A4,Item,1) will return the Last Instance of "Hats"

  • This is not working for me, I am using Excel 2007

    Getting result as "#NAME?" for both Fist Item and Last Item

  • This is not working for me, I am using Excel 2007

    Getting result as "#NAME?" for both Fist Item and Last Item

  • It is because the formula included a range name here.  You should range the range name for both item & data or you just create a table for it.

    Thanks!

  • Also, these array formulas work:

    B4 {=SMALL(IF(Item=A4,Dates),1)}

    C4 {=LARGE(IF(Item=A4,Dates),1)}

    Using the LARGE and SMALL functions has the added advantage that if you want to find the nth smallest/largest (etc), then just change the ",1"   part of the function accordingly (e.g. to ",2" for the 2nd largest or smallest).

  • Interesting. But I think John Kerr's solution is the neatest (so far). It also returns the oldest and youngest even if the table is not sorted in date value.

  • An alternative solution without using array formulas:

    B4: =SUMPRODUCT(SMALL(((Item=A4)*Date+((Item=A4)*Date=0)*10^99),1))

    C4: =SUMPRODUCT(LARGE((Item=A4)*Date,1))

    Note:

    - The above formulas should work even the data is not sorted.

    - For the B4, we multiply 10^99 (a very large number) for the non-match one. In that case, we can use the SMALL function to find the smallest value.

  • Hi,

    All your alternative solutions work well, but the point of the post was that the data was purely sequential. If the dates are not in order then the alternative solutions are very useful as they still bring back the ealiest/latest date. However, the post states the intention as finding the last item in the list, which it will do regardless of the order of the dates, it will always find the last entry for the item. Also, it states not just for dates. Whilst numerical values should work with the alternatives suggested (as long as the max and min values are set appropriately), mixing text with numerical values gets mixed results (correct when a numeric value lies in an appropriate position), but text only does not work with any of the alternatives.

    All that said, they are still useful ideas for pulling out min/max dates from a list, especially when it is not sorted.

    Sean

  • Dear Sean,

    You can just use indirect(    max( .row( ..), 1).  Please try by yourself first!  If you still cannot find the answer, I will tell you.

    Sean, all the problems that you should try to resolve by yourself first.

    Thanks!

    Andrew

  • =LOOKUP(2,1/(Item=A4),Date)

    Array formula entry unnecessary.

    Note: when there are no entries in Item equal to A4, the SUMPRODUCT call returns 0, but 0 is a valid 2nd argument for INDEX. In those situations, the INDEX.,SUMPRODUCT(.)) formula returns an erroneous result, the 1st value in the Date range. So the INDEX.,SUMPRODUCT(.)) formula isn't robust.

Comments

Comments: (loading) Collapse