How To Delete Duplicate Rows in PostgreSQL

Preparing sample data

First, create a new table named carsthat stores fruits:

CREATE TABLE cars(
    id SERIAL PRIMARY KEY,
    make VARCHAR(50) NOT NULL,
    model VARCHAR(50) NOT NULL
);

Second, insert some make, model into the carstable.

INSERT INTO   cars (make,model) values  ('honda','pilot');
INSERT INTO cars (make,model) values('toyota','prius');

INSERT INTO cars (make,model)values('honda','pilot');
INSERT INTO cars (make,model) values('toyota','rav4');
INSERT INTO cars (make,model) values('honda','crv');

INSERT INTO cars (make,model) values('toyota','prius');
INSERT INTO cars (make,model) values('subaru','outback');
INSERT INTO  cars (make,model) values('subaru','impreza');
INSERT INTO cars (make,model) values('honda','crv');

Third, query data from the carstable:

select * from cars;

Finding duplicate rows

If the table has few rows, you can see which ones are duplicate immediately. However, it is not the case with the big table.

The find the duplicate rows, you use the following statement:

SELECT make,model,COUNT( model) FROM cars
GROUP BY make,model
HAVING COUNT( model)> 1
ORDER BY make,model;

Deleting duplicate rows using DELETE USING statement

The following statement uses the DELETE USING statement to remove duplicate rows:

DELETE FROM cars a USING cars b
WHERE a.id < b.id AND a.make= b.make AND a.model=b.model;

In this example, we joined the cars table to itself and checked if two different rows (a.id < b.id) have the same value in the make and modelcolumn.

the statement removed the duplicate rows with lowest ids and keep the one with the highest id.

Deleting duplicate rows using SubQuery

The following statement uses a sbquery to delete duplicate rows and keep the row with the lowest id.

DELETE FROM cars
WHERE 
id IN (SELECT id FROM 
         (SELECT id,ROW_NUMBER() OVER
            ( PARTITION BY make, model ORDER BY  id 
            ) AS row_num FROM cars 
         ) t WHERE t.row_num > 1 
       );

In this example, the subquery returned the duplicate rows except for the first row in the duplicate group. And the outer DELETE statement deleted the duplicate rows returned by the subquery.

Deleting duplicate rows using an immediate table

To delete rows using an immediate table, you use the following steps:

  1. Create a new table with the same structure as the one whose duplicate rows should be removed.
  2. Insert distinct rows from the source table to the immediate table.
  3. Drop the source table.
  4. Rename the immediate table to the name of the source table.

The following illustrates the steps of removing duplicate rows from the carstable:

-- step 1
CREATE TABLE cars_temp (LIKE cars);

-- step 2
INSERT INTO cars_temp(make,model, id)
SELECT DISTINCT make,model,id FROM cars; 

-- step 3
DROP TABLE cars;

-- step 4
ALTER TABLE cars_temp RENAME TO cars;                 

source:https://www.postgresqltutorial.com/how-to-delete-duplicate-rows-in-postgresql/

Leave a Reply

Your email address will not be published. Required fields are marked *