Use AutoFilter to Delete Duplicate Worksheet Entries

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

Use AutoFilter to Delete Duplicate Worksheet Entries

by Sean Kavanagh

Application: Excel 2000
Operating System: Windows

When you maintain long lists of data in Excel, you'll inevitably have to deal with the issue of duplicate data. No matter how vigilant you are, it seems that duplicates manage to work their way into your data. Fortunately, Excel has built-in features that you can use to ensure that duplicated information is kept to a minimum. In this article, we'll show you how to use a worksheet formula and Excel's AutoFilter feature to single out duplicate entries. Once identified, you can quickly delete the data.

Overview of the technique

There are two tools in Excel that lend themselves to the task of working with duplicates: the AutoFilter and Advanced Filter features. We'll work with the AutoFilter in this article and take a look at the alternative technique in a future article. To use the AutoFilter, your data should be organized in a standard list format, where each column in the list represents a data field and each row represents a record. Ideally, the first row of the list should contain field names.

To identify duplicates, we'll add a column to the list and use a conditional worksheet formula that returns a 0 the first time that a particular data item is encountered in the list. Each additional occurrence of the item will return a result of 1 in the column. To review or remove the duplicates, we'll use the AutoFilter tool to display only the records flagged with 1.

Putting the technique into practice

To illustrate our technique, open a new workbook and enter the data shown in Figure A. Our example will look for duplicates based on data in the Name column. The first step is to add the column of flags that indicates duplicate records. In cell C1, enter the field label Duplicates. Then, in cell C2, enter the formula

=IF(COUNTIF(A$2:A2,A2)>1,1,0)

Figure A: We'll flag the duplicate entries using a worksheet formula.

This formula uses the COUNTIF function, which counts the number of cells in a specified range that meet particular criteria. This first argument in the COUNTIF function, in this case A$2:A2, is the range that's being searched. The second argument, A2, indicates the criteria we're looking for.

In our formula, we're specifying that we want to count the number of times that the value stored in cell A2 appears within range A$2:A2. The COUNTIF function is nested within an IF function, which returns the result ultimately displayed in cell C2. If the result of the COUNTIF function is greater than 1, our formula returns 1; otherwise it returns 0. The COUNTIF function in our worksheet formula evaluates to 1 because the value Lou appears once in the range A$2:A2, so the value 0 appears in cell C2.

You may have noticed that we only used the dollar sign symbol in the first part of our COUNTIF range address. Doing so means that we can copy the formula to other worksheet cells and the address used in that part of the formula will always refer to row 2. The reason why this is important will become clear in a moment.

To finish flagging our records, select cell C2 so that we can copy the worksheet formula to the rest of the appropriate cells in column C. To do so, double-click on the fill handle (the small square in the lower-right corner of the cell). At this point, your worksheet resembles Figure B.

Figure B: The duplicate records have a 1 displayed in column C.

Click in cell C6. Notice that the formula displayed in the
Formula bar is

=IF(COUNTIF(A$2:A6,A6)>1,1,0)

The formula arguments have adjusted relative to the current cell's distance from the original formula. In the case of this formula, the COUNTIF criteria stored in A6 appears twice in range A$2:A6. Since 2 is greater than 1, the IF function returns 1, identifying the record in row 6 as a duplicate entry.

Removing the duplicates

Now that you've marked the duplicates, you probably want to remove them. This is where we'll put the AutoFilter to work. First, ensure that a cell within the used range is selected. Then, choose Data | Filter | AutoFilter from the menu bar. Dropdown arrows now appear along with the field names in the first row of the range. Next, click on the dropdown arrow in the Duplicates cell and select 1. Only the duplicate records are displayed at this point, as shown in Figure C.

Figure C: The AutoFilter hides all of the first occurrences, leaving only the duplicates.

To remove the duplicates, simply select the row headings for rows 6 through 10. Then, press [Ctrl] and the minus key on the numeric keypad (-) or choose Edit | Delete Row from the menu bar. To restore your view of the data list, choose (All) from the Duplicates dropdown arrow. Only the unique items remain, as shown in Figure D.

Figure D: The duplicate entries have all been removed from our list.

Things aren't always this easy

The example we looked at in this article dealt with duplicates based upon a single data field. Unfortunately, you may not always be able to base your duplicate identification on one field. For instance, you may want to compare duplicates based on first and last names, or add location criteria to your evaluation. We'll look at how to address duplicates that require more complex comparison criteria in a future issue.

Copyright © 2001 Element K Content LLC. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Element K Content LLC is prohibited. Element K is a service mark of Element K LLC.