# Fun With Conditional Formatting

Yesterday I wrote about some work I was doing with Tables recently.  Today I want to do the same for conditional formatting – specifically, using colour scales.  (For a refresher, or for those that are new to this blog, you can read up on changes to conditional formatting in this series of blog posts, and you can read specifically about colour scales here).

In this case, I was looking at a table that contained the results of a set of tests.  The table looked like this (again, I made up the data for this blog entry).

(Click to enlarge)

Essentially, we run a bunch of scenarios against a series of tests periodically and look at the results.  0 is ok, positive numbers are good (the higher the better), and negative numbers are bad (lower means worse).  Seemed like an ideal candidate for a colour scale.  Using a button that I had added to my Quick Access Toolbar, I applied a colour scale to the table.  Here is the result.

(Click to enlarge)

As you can see, there are a couple of problems.  The biggest one is that there are obviously some outliers on the low end of the scale that mean that everything else looks good (remember, that in the default application, the 3-colour scale uses the lowest point, the 50 percent mark, and the highest value to assign colours … if there is an outlier on either end of the data, it can lead to results like this).  You can really see the problem if you click on the image above.  So, to fix that, I made some adjustments to the rule.  Using the manage rules dialog …

(Click to enlarge)

… I made a couple of changes to how the rule was going to be evaluated.  First, I changed the midpoint from 50% to a number, and set that number to 0 (since above 0 was good and below was bad).  Second, I used a formula to determine the value for the minimum (lower bounds) colour to eliminate the effect of the outliers.   Specifically, I changed the minimum to be of type “formula” and entered this formula, which simply determines what the value is of the 5th percentile: =PERCENTILE(\$C\$3:\$I\$52, 0.05).  (Choosing the 5th percentile is not terribly scientific, but I had seen this report enough to know that there were never more than one or two outliers, so using the 5th percentile should handle them).

(Click to enlarge)

So now my rule ran something like this – “turn -1.36 (what the formula calcs to in this data set) and everything below it deep red.  Everything between -1.36 and 0 is a gradient fill between red and yellow.  Everything between 0 and the highest value is a gradient fill between yellow and green.  The results look something like this:

(Click to enlarge)

That was better, but still not what I wanted, since the effect was pretty busy.  So, I decided to make one more change.  Once again I opened up the Edit Rule dialog, and this time I changed the colour for the midpoint from yellow to white.

(Click to enlarge)

Once I hit OK a couple of times, the result looked like this …

(Click to enlarge)

… which is pretty much what I wanted.  Values close to 0 are mostly white, so they are not called out as good or bad visually, the outliers on the negative side (probably measurement problems) are not skewing the results, and there is good gradation on both sides, allowing me to quickly eyeball around for things to investigate.

Hopefully that was interesting – the point being that by tweaking the rules themselves, there is no end of possibilities available with conditional formatting.

• ZivC, we considered advertising this at the top level in the ribbon, but based on our research, the new conditional formatting rules we added covered by far the majority of the cases that people use formulas for in current versions of Excel.  We will continue to think about this area going forward.

• How do I programmatically add a formatting rule? For example, I would like to create a macro that creates a number of formatting rules that can be used later by the user or write an Add-In that generates formatting rules. Does the Excel Object Model include these rules?

Thanks,

Andrew

• Andrew, check out this post for some VBA examples:

blogs.msdn.com/.../481237.aspx

• "the "Formula Is" functionality is hidden, and hard to use"

I don't know about Excel 2007, but in earlier versions it's not hidden at all and is exactly where you would think it should be, under Format. It's "easy" to use. In fact, I use "Formula Is" exclusively. I never use "Cell Value Is".

"use "Formula is" and return TRUE or FALSE based on that formula"

Using a logical comparison that returns a boolean is the most common method, however, you can use any formula expression that returns a numeric value as well. Any numeric value other than 0 will be evaluated as TRUE and the format will be applied. A numeric return of 0 will be evaluated as FALSE and the format will not be applied. A very simple example would be:

=COUNTIF(A\$1:A\$10,"x")

If the result was 0 no format would be applied.

If the result was 7 the format would be applied.

• Biff, the “Formula Is” equivalent is also easy to find in Excel 2007. You select New Rule from the Conditional Formatting menu (which displays after you click the Conditional Formatting button on the Ribbon). In the New Formatting Rule dialog box, there's a list of rule types and one of these is "Use a formula to determine which cells to format."

I suspect that the real concern being alluded to is contained in the "...and hard to use" part. Except for providing examples in a help window, I don't know how the UI could make using specific formulas in conditional formatting more "discoverable." For example, how can the UI expose the fact that a formula like =COUNTIF(\$A\$1:\$A\$10,A1)>1 highlights duplicates in the selected range A1 to A10? From this perspective, formulas would appear "hard to use." However, what you can do is figure out the most common conditions that folks use conditional formatting formulas for and expose these conditions as highlighting options in the IU, thus making these specific conditions “discoverable.”

In this respect, Excel 2007 does a reasonably good job. For example, on the Conditional Formatting menu you can choose to highlight cells containing duplicates, unique values, top N items, top N %, bottom N items, bottom N %, above average, below average and 1 to 3 standard dev above or below average.  You can highlight cells with dates occurring today, yesterday, tomorrow, last 7 days, next month and so on.  You can highlight cells with text containing, not containing, beginning with & ending with.  You can highlight cells with blanks, non-blanks, errors or no errors.  Finally, there’s a column comparison option available to Tables.  To highlight cells with any of the preceding conditions in earlier versions of Excel, you'd need to use formulas, and this is where discoverability becomes an issue for most folks.  In Excel 2007, there should be far less need to use formulas at all.

On a related note, it’s unfortunate that Data Validation wasn’t a similar beneficiary of any new conditions in Excel 2007.  For example, the ability to use UDFs directly in data validation formulas (the workaround to reference a UDF in a worksheet cell is lame and doesn’t work if you want to apply the UDF to a bunch of input cells). Also, I’d have liked to see a “like” operator, so that you could validate a specific input string format.

• How does Excel 2007 conditional formatting handle these:

** highlight cells containing duplicates

Is the first instance of a value considered a duplicate?

A1 = Smith

A2 = Smith

Are both cells highlighted or is just A2 highlighted?

** Top/botton N lists

Are "ties" accounted for?

A1 = 1

A2 = 1

A3 = 1

A4 = 2

A5 = 3

Would the bottom 3 be A1:A3 or A1:A5 ?

Or:

A1 = 1

A2 = 1

A3 = 1

A4 = 1

A5 = 3

Would the bottom 3 be A1:A4 ?

• Is the first instance of a value considered a duplicate?

A1 = Smith

A2 = Smith

Are both cells highlighted or is just A2 highlighted?

<<>>

A1 = 1

A2 = 1

A3 = 1

A4 = 2

A5 = 3

Would the bottom 3 be A1:A3 or A1:A5 ?

<<>>

A1 = 1

A2 = 1

A3 = 1

A4 = 1

A5 = 3

Would the bottom 3 be A1:A4 ?

<<>>

• The only gripe I have about CF Formula Is, is that teenie tiny microscopic little box for the formula. Argh!

• Interesting discussion.  Colin, thanks for the explanation.  One note – I am sorry to say the “column comparison” is going to be gone in the beta refresh (and for 2007).  We turned up some pretty significant bugs recently, and we are past the point where we can do the work required to address them, so that is going to have to wait for the next version.  Thanks for the data validation items.

Biff, I agree with you about the smaller refedits.  That’s something I would like to try and solve all over the place in the future.

• I would upgrade just for that one improvement!

• David,

We develop a product that exports to Excel.  In the past, we have not pushed too much of our metadata over to the Excel side, but with the new Open XML formats, we are in the position to do this much more.  One of the main areas for this is Conditional Formatting.  We are able to define conditional formatting with our product and now push it over to Excel Conditional Formatting rules, rather than just applying the formatting on the cells/ranges themselves.

Our product is capable of dealing with quite a lot of data, as well as having fairly complex formatting rules.  What we found when we started to do this with XLSX is that as with Excel 2003, Excel 2007 remains very buggy in all areas of conditional formatting: display and configuration.

1. On a fairly up to date machine (18 mths old middle end PC, 1GB RAM) XLS (using BIFF8) or XLSX files with more than 2000-3000 conditional formatting 'rules' seems to be the tipping point to poor display performance. We can exceed that very easily with an export from our product.

2.Whereas OpenOffice.org 2.0 will show similar problems when opening an XLS file with large numbers of conditional formats, once opened, it will outperform Excel 2007 in display performance.

3.In our efforts to produce creative formulas in XLS to overcome the 3-rule limit, we discovered that the boolean operator OR() will evaluate both of its expressions, even though the first evaluates to true, (as we later confirmed in the Excel 2003 help). Could this be fixed in 2007?

Seeing as the opening up of the Excel file format to developers (only true Excel dependents such as ourselves tackled native BIFF8) will allow many more applications to create rich, fully featured Excel files, now potentially up to 1 million rows and numerous columns in size, could you review the way Excel handles conditional formatting from a performance perspective.

The legacy way that conditional formatting is implemented in openxml, might benefit from a different approach, such as using formatting IDs as with traditional xfids, allowing cells to be a member of a cell formatting group, as opposed to a post-display evaluation

I can certainly generate very complex files for you, if you need them.