HOWTO: Count Unique Groups of Items in Excel

I find myself constantly in a situation where I have a giant list of data and I need to know which entries in that data occur most frequently. I’ve always used Excel pivot tables for this purpose.  Recently someone saw me solve a problem using this technique and they were very surprised and wished they’d known this years ago. With that in mind, I figure I’d share my technique.

Let’s create a made-up scenario for demonstration purposes. Let’s say you want to know which DLL on your computer loaded in memory most frequently. The scenario doesn’t matter, it’s just an excuse to generate some data.

Ultimately, you’re going to end up with Excel open with a bunch of columns of data similar to this:

As you can see, we have 2,092 rows of data with the fourth column being FileName. We want to produce a list in order of frequency of which DLL is listed most often. To do that complete the following steps:

  • Select either the column you’re interested in or the entire sheet depending on your needs
  • Select Insert / Pivot Table
  • The range will default to what you selected. Since you selected all the data, simply press OK

 

  • In the right hand pane, you’ll note the columns from your sheet are listed. Select the column you want to group and sort. In this case select FileName

 

 

  • If you look closely, you’ll see that the column you selected is added under Rows in the bottom right

 

  • You should see your data appear in the left hand side. But there is no quantity. To get that, select the FileName column again from the top right hand pane and drag it into the Values section in the bottom right
  • The field name will change to Count of [ColumnName]
  • A new column will be included by the same name. This automatically groups the data. But it doesn’t sort it yet

     

 

  • Sorting is a little tricky. It turns out you can’t use the giant sort button on the ribbon. If you do, you’ll get an error:

     

     

  • Instead, you need to right click in the within the column region (highlighted in yellow below) and choose Sort from there.

 

  • Tada! We now know that ntdll.dll is the most loaded module on my machine. I’m sure you can think of a few other scenarios where this technique will be useful

 

 

 

 

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.