Preparing sample data
First, create a new table named cars
that stores fruits:
CREATE TABLE cars( id SERIAL PRIMARY KEY, make VARCHAR(50) NOT NULL,
modelVARCHAR(50) NOT NULL );
Second, insert some make, model into the cars
table.
INSERT INTO cars (make,model) values ('honda','pilot'); INSERT INTO
values('toyota','prius'); INSERT INTO
cars (make,model)
values('honda','pilot'); INSERT INTO
cars (make,model)
cars (make,model)
values('toyota','rav4'); INSERT INTOvalues('honda','crv');
cars (make,model)
INSERT INTO
values('toyota','prius');
cars (make,model)
INSERT INTO
cars (make,model)
values('subaru','outback');
INSERT INTO
cars (make,model)
values('subaru','impreza');
INSERT INTO
values('honda','crv');
cars (make,model)
Third, query data from the cars
table:
select * from cars;
Finding duplicate rows
If the table has few rows, you can see which ones are duplicate immediately. However, it is not the case with the big table.
The find the duplicate rows, you use the following statement:
SELECT make,model,COUNT( model) FROM cars
GROUP BY make,model
HAVING COUNT( model)> 1
ORDER BY make,model;
Deleting duplicate rows using DELETE USING
statement
The following statement uses the DELETE USING
statement to remove duplicate rows:
DELETE FROM cars a USING cars b
WHERE a.id < b.id AND a.make= b.make
AND a.model=b.model;
In this example, we joined the cars
table to itself and checked if two different rows (a.id < b.id) have the same value in the make and model
column.
the statement removed the duplicate rows with lowest ids and keep the one with the highest id.
Deleting duplicate rows using SubQuery
The following statement uses a sbquery to delete duplicate rows and keep the row with the lowest id.
DELETE FROM cars WHERE
id IN (SELECT id FROM (SELECT id,ROW_NUMBER() OVER
( PARTITION BY make, model ORDER BY id
) AS row_num FROM cars
) t WHERE t.row_num > 1
);
In this example, the subquery returned the duplicate rows except for the first row in the duplicate group. And the outer DELETE
statement deleted the duplicate rows returned by the subquery.
Deleting duplicate rows using an immediate table
To delete rows using an immediate table, you use the following steps:
- Create a new table with the same structure as the one whose duplicate rows should be removed.
- Insert distinct rows from the source table to the immediate table.
- Drop the source table.
- Rename the immediate table to the name of the source table.
The following illustrates the steps of removing duplicate rows from the cars
table:
-- step 1
CREATE TABLE cars_temp (LIKE cars);
-- step 2
INSERT INTO cars_temp(make,model, id)
SELECT DISTINCT make,model,id FROM cars;
-- step 3
DROP TABLE cars;
-- step 4
ALTER TABLE cars_temp RENAME TO cars;
source:https://www.postgresqltutorial.com/how-to-delete-duplicate-rows-in-postgresql/