Group numbers into custom ranges

I was looking at SQM data about file size last week and ran into an interesting question. We capture the size of the file every time Access opens the file in SQM. It is interesting data as it provides insights into the size of a typical existing app. I wanted to bucket the ranges into something more readable with smaller buckets at the low end and wider buckets at the top end. Something like:

image

(The actual numbers have been changed.)

First, I created a table called FileSizeRanges that defined the custom ranges.

 image

The size of the files was stored in a table called LOCALDATA_FileDetails in a field called SizeOfFileKb.

I created three queries:

qryFileSizeRanges

SELECT LOCALDATA_FileDetails.[SizeOfFileKb], FileSizeRanges.*
FROM LOCALDATA_FileDetails, FileSizeRanges
WHERE [SizeOfFileKb]>=MinSize And [SizeOfFileKb]<MaxSize;

qryFileSizeRangeCount (shows friendly name for each group)

SELECT ID, MB, Count(SizeOfFileKb) AS RangeCount
FROM qryFileSizeRanges
GROUP BY ID, MB;

qryFileSizeRangeDisplay (shows the MinSize and MaxSize values for the friendly name)

SELECT MinSize & "-" & MaxSize AS Range, RangeCount
FROM qryFileSizeRangeCounts;

The qryFileSizeRangeCounts query takes 2 minutes and 31 seconds to rip through 3.5M records on my ThinkPad T60 running Windows 7 (BTW – I’m loving Win 7). Seems like reasonable performance to me. I turned on the totals row to get the total count and pasted it into Excel to visually clean up.

You don’t need to use the qryFileSizeRangeDisplay query if you want to use the friendly names in the FileSizeRanges.MB column for friendly display names. In my case I had converted KB into MB and provided some friendly text to make it more readable.

Office Blogs Comments

Comments: (3) Collapse

  • I wrote a similar article that uses IIF functions here

    www.vb123.com.au/.../trickyqueries.htm Garry Robinson

  • I wonder what the performance would be like if you indexed the MinSize and Maxsize fields. Then try indexing the SizeOfFileKb field. Just for the fun of it. I had to do a query about ten or thirteen years ago on double fields which were in a Cartesion join. It was trying to locate all the places on a 1000 mile long pipeline where both cracks and corrosion were within a few metres of each other. Indexing the location, which was a double field, used in the Between dropped the query from 25 minutes to 25 seconds.

  • Why not use the PARTITION function?

Comments

Comments: (loading) Collapse