Back to Blog

How to Remove Duplicates in Google Sheets | Ultimate Guide

Imagine the following: you have imported data from your database to Google Sheets to create a report. While scanning through the entries, you spotted that some of them are duplicated like this:

Would you remove the duplicate rows manually? This could be an option, but if your data set is large enough, you’d be better off automating the removal. In this guide, we’ll show you all the possible options to delete duplicates using native functionality, formulas or tricky workarounds.

The easiest way to remove duplicates in Google Sheets

You don’t need to manually delete each duplicate row, since Google Sheets can do this for you. Go to the Data menu and select Remove duplicates.

Then you’ll need to select the columns to analyze for duplicates. For example, you want to remove entries that have a duplicate name or date. In this case, you’ll need to select these columns only. In our case, we want to totally delete duplicate entries. So, we select all columns and click Remove duplicates.

The next moment, the duplicate rows have been removed successfully.

This functionality will work for larger data sets as well. However, its main drawback is that you’ll have to manually repeat it each time you add new entries to the data set. And what if you have added more than 50 entries and you’re not sure whether there are any duplicates? Below, you’ll find the solutions to automate the identification and removal of duplicate entries.

Remove duplicates in Google Sheets: possible reasons

Usually, users need to solve one of the following goals: 

  • Remove duplicates in one column
  • Remove duplicates in multiple columns

How to remove duplicates in one column in Google Sheets 

When you are dealing with a single column or an integral data range, UNIQUE function is the best option to remove duplicates. You’ll need one formula to do the job.

Alternatively, you can identify duplicates using the COUNTIF function in a separate column, or even highlight them using Conditional Formatting Google Sheets. Then you’ll be able to remove them manually or by using the Remove Duplicates functionality. Jump to the needed section if this is what you need.

How to remove duplicates in Google Sheets from 2 or more different columns

This requires a more advanced approach, and you can benefit from multiple solutions:

  • Pivot table – it will automatically remove duplicates in a separate sheet
  • UNIQUE function – it can remove duplicates from an integral data range
  • QUERY function – it’s a bit intricate, but a reliable solution
  • Two step approach – where you first identify duplicates by a unique ID and then remove them using an advanced formula 

We would recommend you check out the pivot table first and then move to formula-based options.

How to remove duplicates in Google Sheets but keep their position

The best way to remove duplicates without ruining the order of your data set is to use Conditional Formatting. This option includes two steps:

  • Identifying and highlighting duplicates 
  • Removing duplicates manually 

This will take time, but you’ll be able to handle the task as you need to.

If you have a clear vision on which solution will do the job for you, go to the relevant section right away. If you don’t, check out all of them and choose the best one for your needs. 

How to remove duplicates in Google Sheets using pivot tables

The idea of a pivot table is that you can rotate your data set to view it from a different perspective or reorganize data without changing it. We won’t change the perspective, but simply display data in the pivot table, which identifies and removes duplicates itself as follows:

To do this, go to the Data menu and select Pivot table.

Then select the data range to analyze and specify where the pivot table will be created: either in a new or existing sheet.

The pivot table was created and you need to set it up using the Pivot table editor that you’ll see on the right of your spreadsheet. 

Add rows to your pivot table

In our example, we wanted to transfer all the columns from our data set. To do this, click Add (Rows section) and select the column. We also unchecked the Show totals checkbox.

You’ll have to repeat this action for each column separately. In the end, you’ll get the pivot table with the duplicates removed:

Add values to your pivot table

If you want to know which entries were duplicated, go to the Values section in the Pivot table editor and click Add.

Choose the column (s) to analyze for duplicates and select COUNTA in the Summarize by field. You’ll get a separate column with the numbers of entries:

  • 1 means that there is one entry
  • 2 or more means that this entry was duplicated

Now, if you add new entries to your initial data set, they will be automatically identified in this column and removed in your pivot table. 

We have introduced the basic pivot table configuration for this use case. Read our Pivot Table Google Sheets guide to customize it for your needs.

How to delete duplicates in Google Sheets using formulas

If you’re still reading this, then the options above probably did not solve your problem. Okay, let’s dive deeper and explore the Google Sheets functions that you can delete duplicates with. 

How to remove duplicates in Google Sheets using the UNIQUE formula

UNIQUE is a Google Sheets function to return unique values from a range. So, it’s meant to remove duplicates. It has a very simple syntax:

=UNIQUE(data-range)

For example, let’s extract unique values from our Transaction Type column (B2:B). Apply the following formula and here we go:

=unique(B2:B)

If you need to remove duplicates based on the analysis of multiple columns, like in our case, UNIQUE will do the job as well. Specify the range to check and it works:

=unique(Sheet1!A1:J)

The drawback of the UNIQUE option is that you cannot choose which columns you want to analyze for duplicates. But you can do this with QUERY.

Use Google Sheets QUERY function to remove duplicates in a spreadsheet on multiple columns

The Google Sheets QUERY function lets you select specific columns of your data set and delete duplicate entries from them. The QUERY formula syntax is the following:

=QUERY(data-range,query)

The query to remove duplicates should contain:

  • SELECT clause to select the columns you want to query. 
  • GROUP BY clause to group values across the selected data range. GROUP BY only works if you apply one of the aggregation functions (avg, count, max, min, sum) within the SELECT clause.
  • OFFSET or WHERE clause to skip an empty row at the top

Here is how the query may look:

"Select A,min(B),H,I group by A,H,I offset 1"

Let’s check it out with an example. We need to remove duplicates from the same data set that we used above. Another condition is to return only four columns from it: Date, Transaction Type, Split, and Amount. Here is the QUERY formula to do this:

=query(A2:I,"Select A,min(B),H,I group by A,H,I offset 1")

We showed it next to the results obtained with the UNIQUE formula, so you can compare the difference. The duplicates have been removed and QUERY sorted the entries in descending order. Let’s explain how it worked:

  • Select A,min(B),H,I – we specified the columns we need.
  • min(B) – we have to apply an aggregation function within the SELECT clause, so we choose min.
  • group by A,H,I – we specified the columns to group by (the column with the aggregated function cannot be included here).
  • offset 1 – we used OFFSET to remove the empty row at the top. Alternatively, you can use the WHERE clause (where A Is not NULL) to exclude entities that have no numeric value.

If the QUERY option is what you need but you think it’s a bit tricky, here’s an alternative solution.

Identify and remove duplicates from Google Sheets on multiple columns using the merging functions

This option consists of two steps:

  • Identify duplicates in the columns you want to analyze.
  • Remove duplicate entries. 

Identify duplicates in your data set

To identify the duplicate entries in specific columns, you’ll need to create a unique identifier based on the values from these columns. Let me explain this with an example.

In our data set, we want to remove duplicates analyzing two columns: Name and Amount. Note that we exclude the Date column from analysis. 

To make a unique identifier for each entry, you’ll need to merge values from these columns into a single string (one string for each row). To do this, create a new column on the left and apply the following formula:

={"Id";ARRAYFORMULA(CONCAT(F$2:F,J$2:J))}
  • F$2:F – Name column
  • J$2:J – Amount column
  • CONCAT(F$2:F,J$2:J) – CONCAT formula to merge two values together
  • ARRAYFORMULA – function to expand the CONCAT formula 
  • {"Id"; – to assign the column name 

As a result, we now have unique IDs that let us identify duplicates easily:

Note: If you need to search for duplicates on more than two columns, the CONCAT function won’t do the job. You can replace it with another function or method to merge data such as &(ampersand), CONCATENATE, JOIN, etc. Read our blog post to learn more about how to merge cells in Google Sheets.

Remove duplicates using their unique IDs

Create a new sheet where you want to import the data without duplicates and apply the following formula:

={
  Sheet1!A1:J1;
  SORTN(
       FILTER(Sheet1!A2:J, LEN(Sheet1!A2:A) > 0),
       ROWS(FILTER(Sheet1!A2:J, LEN(Sheet1!A2:A) > 0)),
       2,1,true
  )
}
  • Sheet1!A2:J – data set with duplicate entries
  • FILTER(Sheet1!A2:J, LEN(Sheet1!A2:A) > 0) – to exclude empty rows from the initial data set 
  • ROWS(FILTER(Sheet1!A2:J, LEN(Sheet1!A2:A) > 0)) – the number of rows in the initial data set
  • SORTN(...,2,1,true) – to show at most the first n rows after removing duplicate rows and sort by the first column in ascending order
  • {Sheet1!A1:J1 – to import column names from the data set with duplicate entries

It worked! The formula returned the initial data set without duplicates by taking the first occurrence of each row.

Is this still not what you’re looking for? Keep going then.

How to remove duplicates in Google Sheets using Conditional Formatting

The Conditional Formatting functionality is not actually for deleting duplicates. You can use this feature to highlight them. So, for removal, you’ll have to complete two steps:

  • Identify and highlight duplicates with Conditional formatting
  • Delete duplicates

Identify and highlight duplicates in Google Sheets using Conditional Formatting and the COUNTIF function 

Go to the Format menu and select Conditional Formatting

Apply the following Conditional formatting rules, depending on your goals:

Highlight duplicates in one column

Let’s say you want to highlight duplicates in one column. Select the column range, for example B1:B. Select “Custom formula is” as a formatting rule and apply the following COUNTIF formula:

=COUNTIF($B$1:$B1,B1)>1

Note: the values in the formula relate to the column range you choose.

And here is how it looks:

If you want to learn the logic behind this formula, check out the video “6 Ways to Highlight Duplicates in Google Sheets” by Railsware. 

Highlight duplicates in multiple columns

Let’s say you want to highlight duplicates in a few columns: Date (A1:A), Transaction Type (B1:B), Name (E1:E), and Amount (I1:I). Select the data range that will cover the mentioned columns, in our case – A1:I. Select “Custom formula is” as a formatting rule and apply the following COUNTIF formula:

=(COUNTIF($A$1:$A,$A1)>1)*
(COUNTIF($B$1:$B,$B1)>1)*
(COUNTIF($E$1:$E,$E1)>1)*
(COUNTIF($I$1:$I,$I1)>1)

Note: the values in the formulas relate to the columns you choose.

Duplicates on multiple conditions will be highlighted:

Remove duplicates with the Remove Duplicates tool

When you see the highlighted duplicates in your data set, you can easily remove them. You can do this manually or using the Remove Duplicates functionality that we described at the very beginning.

How to remove duplicates in Google Sheets using Apps Script

We had doubts as to whether we needed to include this option here, but we wanted to cover all the possible solutions. Apps Script is one of those. The idea is quite simple: you can create a custom function that will remove duplicates based on the preset conditions. It’s quite useful when you have a sheet that continuously accumulates new entries. You can specify the conditions (columns to analyze) in the function and simply run it every time you want to remove duplicates. Here is how it works:

Go to the Tools menu and select Script Editor.

Here is the script for the basic function to remove duplicates. It analyzes all columns in the active sheet and removes duplicate rows:

function removeDuplicates() {
  var sheet = SpreadsheetApp.getActiveSheet();
  var data = sheet.getDataRange().getValues();
  var newData = [];
  for (var i in data) {
    var row = data[i];
    var duplicate = false;
    for (var j in newData) {
      if (row.join() == newData[j].join()) {
        duplicate = true;
      }
    }
    if (!duplicate) {
      newData.push(row);
    }
  }
  sheet.clearContents();
  sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}

Insert it into the code block and rename your project:

Click the Save project button, then click Run next to it.

You’ll need to grant permission to access your data. Click review permissions and choose the Google account. You may get the following warning window.

Just click Advanced, and then Go to **** (unsafe).

Eventually, you’ll get back to the Script Editor where you’ll see that your duplicate removal function has been executed.

After this, it will remove duplicates in your active sheet whenever you click the Run button.

Customize the Apps Script function to remove duplicates on a specific range

The key value of using Apps Script is that you can customize it to your needs. Let’s say you want to remove duplicates on a specific range (B1:F). Here is how you need to tweak the script to make it work.

function removeDuplicates() {
var sheet = SpreadsheetApp.getActiveSheet();
var rng = sheet.getRange("B1:F")
var data = rng.getValues();
var newData = new Array();
for(i in data){
var row = data[i];
var duplicate = false;
for(j in newData){
  if(row.join() == newData[j].join()){
    duplicate = true;
  }
}
if(!duplicate){
  newData.push(row);
}
}
rng.clearContents();
sheet.getRange(1, 1, newData.length, 
newData[0].length).setValues(newData);
}

For more advanced customization, you’ll need to dive deeper into the App Script specifics. 

Bonus stage: remove duplicates from data imported from HubSpot

In this bonus stage, we’d like to share a real-life use case of how Coupler.io users remove duplicates from their imported data. Coupler.io is a tool to import data to Google Sheets from different sources. One of these sources is HubSpot. Coupler.io provides a HubSpot importer, which can import raw data from HubSpot to Google Sheets automatically on a schedule. Once the data reaches the spreadsheet, the magic of Google Sheets begins. Here is how it looks:

Step 1: HubSpot to Google Sheets

The HubSpot data is imported automatically into the spreadsheet where the entries (again automatically) get their unique IDs using the following formula:

={"Id";
  ARRAYFORMULA(
    IF(
     LEN(J$2:J)>0,
     CONCAT(J$2:J,B$2:B),
     ""
    )
   )
}

Step 2: Deduplication and query

The imported HubSpot data is rid of duplicates (deduplicated) using the following formula:

={Imported!A1:J1;
  SORTN(
    SORT(
      FILTER(
        Imported!A2:J, 
        LEN(Imported!A2:A) > 0
       ),
    1,2),
  ROWS(Imported!A2:A),
  2,1,true
  )
}

Then the QUERY formula extracts the required columns to another sheet:

=QUERY(Imported_DEDUPLICATED_all_columns!A1:J, "select A, B, C, D, F, G, H, J, I")

Step 3: Extraction of new entries only

The following formula extracts new entries from the deduplicated HubSpot data based on the analysis of previously processed data which is stored in the sheets Open and Closed: 

={Imported_DEDUPLICATED!A$1:J$1;
  ARRAYFORMULA(
    FILTER(
     Imported_DEDUPLICATED!A$2:J,
     ISERROR(
      MATCH(
       VALUE(Imported_DEDUPLICATED!A$2:A),
       VALUE(Open!B:B),
       0)
     ),
     ISERROR(
      MATCH(
       VALUE(Imported_DEDUPLICATED!A$2:A),
       VALUE(Closed!B:B),
       0)
      )
    )
  )
}

Data import and deduplication are absolutely automatic: 

  • Coupler.io pulls data from HubSpot on a custom schedule 
  • Google Sheets formulas deduplicate data once it gets to the spreadsheet

The combination of these tools can provide you with astonishing integrations and automations. So, you should definitely try them out. Good luck!

Back to Blog

Comments are closed.

Access your data
in a simple format for free!

Start Free