Writing a Count(Distinct) Query in Access

We received a question from a reader named Eric week before last about how to write a Count(Distinct) query in Access. In other database applications such as SQL Server, Oracle, and MySQL, the Count(Distinct) aggregate is used to count the number of unique items in a given rowset. For example, consider a table on SQL Server named tblColors, such as:

ColorValue
Red 5
Green 2
Blue 8
Orange 1
Red 8
Green 6
Blue 2

To get a count of the number of unique colors in the table, you could write a query such as:

SELECT Count(Distinct Color) AS N FROM tblColors

This would return the value 4 as there are four unique colors in the Color field in the table. Unfortunately, the Access Database Engine does not support the Count(Distinct) aggregate. To return this value from an Access table, you would need to use a subquery such as:

SELECT Count(*) AS N
FROM
(SELECT DISTINCT Color FROM tblColors) AS T;

Now let's say that you also want to include another aggregate value such as a Sum, and want to group by some value, in this case, Color. On SQL Server, you could write this query as:

SELECT Color, Sum(Value) AS Total, Count(Distinct Color) AS N
FROM tblColors
GROUP BY Color

This provides the following results:

ColorTotalN
Blue 10 1
Green 8 1
Orange 1 1
Red 13 1

Now, if you're asking whether or not this should return the value of '1', the answer is yes. As I understand it, the Count(Distinct) here can be used as a test to verify the results of a given query.

If your data is on a server that supports Count(Distinct), you might be able to use a pass-through query to retrieve the results. If you are working with Access data, this becomes a bit more challenging.

Since we used subqueries for the previous query, we'll need to do the same here. The trick however is that we need to use two subqueries as shown in the following SQL:

SELECT C.Color, Sum(C.Value) AS Total, T2.N
FROM
    (SELECT T.Color, Count(T.Color) AS N
     FROM
        (SELECT DISTINCT Color, Count(*) AS N
         FROM tblColors GROUP BY Color) AS T
    GROUP BY T.Color) AS T2
INNER JOIN tblColors AS C
ON T2.Color = C.Color
GROUP BY C.Color, T2.N;

You'll recognize the inner-most subquery as the subquery we used in the initial problem. This is then rolled into the outer-most subquery to return the number of colors per group.

Thanks to Eric for the question and to Michael Tucker on the Access test team for the help!

Office Blogs Comments

Comments: (3) Collapse

  • I find it easier to use a saved query, qryColorSums, within another query for this: qryColorSums = SELECT tblColors.Color, Sum(tblColors.Valu) AS Total

    FROM tblColors

    GROUP BY tblColors.Color; qryColorSumsCount = SELECT qryColorSums.Color, First(qryColorSums.[Total]) AS Total, Count(qryColorSums.Color) AS N

    FROM qryColorSums

    GROUP BY qryColorSums.Color; qryCount = SELECT Count(qryColorSums.Color) AS [Count]

    FROM qryColorSums; All this is made easier to understand and remember by viewing in the graphic query designer (paste the SQL text into a query's SQL View box and then selecting the Design View).

  • Have there been improvements, updates, changes to the SQL Access uses, going from Access 2003 to Access 2007?

  • Frederick - the saved query might make this easier to debug and test. Interestingly enough, the subqueries used here appear in the design surface as a derived table, but the designer doesn't make the distinction between a saved query or derived table in this case. Grovelli - the SQL syntax used in Access 2007 is the same as in Access 2003. Rob

Comments

Comments: (loading) Collapse