- SQL Tutorial
- SQL - Home
- SQL - Overview
- SQL - RDBMS Concepts
- SQL - Databases
- SQL - Syntax
- SQL - Data Types
- SQL - Operators
- SQL - Expressions
- SQL Database
- SQL - Create Database
- SQL - Drop Database
- SQL - Select Database
- SQL - Rename Database
- SQL - Show Database
- SQL - Backup Database
- SQL Table
- SQL - Create Table
- SQL - Show Tables
- SQL - Rename Table
- SQL - Truncate Table
- SQL - Clone Tables
- SQL - Temporary Tables
- SQL - Alter Tables
- SQL - Drop Table
- SQL - Delete Table
- SQL - Constraints
- SQL Queries
- SQL - Insert Query
- SQL - Select Query
- SQL - Select Into
- SQL - Insert Into Select
- SQL - Update Query
- SQL - Delete Query
- SQL - Sorting Results
- SQL Views
- SQL - Create Views
- SQL - Update Views
- SQL - Drop Views
- SQL - Rename Views
- SQL Operators and Clauses
- SQL - Where Clause
- SQL - Top Clause
- SQL - Distinct Clause
- SQL - Order By Clause
- SQL - Group By Clause
- SQL - Having Clause
- SQL - AND & OR
- SQL - BOOLEAN (BIT) Operator
- SQL - LIKE Operator
- SQL - IN Operator
- SQL - ANY, ALL Operators
- SQL - EXISTS Operator
- SQL - CASE
- SQL - NOT Operator
- SQL - NOT EQUAL
- SQL - IS NULL
- SQL - IS NOT NULL
- SQL - NOT NULL
- SQL - BETWEEN Operator
- SQL - UNION Operator
- SQL - UNION vs UNION ALL
- SQL - INTERSECT Operator
- SQL - EXCEPT Operator
- SQL - Aliases
- SQL Joins
- SQL - Using Joins
- SQL - Inner Join
- SQL - Left Join
- SQL - Right Join
- SQL - Cross Join
- SQL - Full Join
- SQL - Self Join
- SQL - Delete Join
- SQL - Update Join
- SQL - Left Join vs Right Join
- SQL - Union vs Join
- SQL Keys
- SQL - Unique Key
- SQL - Primary Key
- SQL - Foreign Key
- SQL - Composite Key
- SQL - Alternate Key
- SQL Indexes
- SQL - Indexes
- SQL - Create Index
- SQL - Drop Index
- SQL - Show Indexes
- SQL - Unique Index
- SQL - Clustered Index
- SQL - Non-Clustered Index
- Advanced SQL
- SQL - Wildcards
- SQL - Comments
- SQL - Injection
- SQL - Hosting
- SQL - Min & Max
- SQL - Null Functions
- SQL - Check Constraint
- SQL - Default Constraint
- SQL - Stored Procedures
- SQL - NULL Values
- SQL - Transactions
- SQL - Sub Queries
- SQL - Handling Duplicates
- SQL - Using Sequences
- SQL - Auto Increment
- SQL - Date & Time
- SQL - Cursors
- SQL - Common Table Expression
- SQL - Group By vs Order By
- SQL - IN vs EXISTS
- SQL - Database Tuning
- SQL Function Reference
- SQL - Date Functions
- SQL - String Functions
- SQL - Aggregate Functions
- SQL - Numeric Functions
- SQL - Text & Image Functions
- SQL - Statistical Functions
- SQL - Logical Functions
- SQL - Cursor Functions
- SQL - JSON Functions
- SQL - Conversion Functions
- SQL - Datatype Functions
- SQL Useful Resources
- SQL - Questions and Answers
- SQL - Quick Guide
- SQL - Useful Functions
- SQL - Useful Resources
- SQL - Discussion
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
SQL - Handling Duplicates
SQL is a programming language that is used to manage and manipulate data in relational databases. One of the most common issues that can arise while working with databases is the presence of multiple duplicate records. The duplicate records occur when we sometimes either accidentally or intentionally enter the data into a table multiple times while creating it. Handling duplicates in SQL involves identifying, filtering, removing, or merging duplicate records from a table.
Why is Handling Duplicates in SQL Necessary?
There are various reasons why handling duplicates in a database becomes necessary. One of the main reasons is that the existence of duplicates in an organizational database will lead to logical errors. In addition to it, we need to handle redundant data to prevent the following consequences −
Duplicate data occupies the storage size, which leads to decrease in usage efficiency of a database.
Due to the increased use of resources, the overall cost of the handling resources rises.
With increase in logical errors due to the presence of duplicates, the conclusions derived from data analysis in a database will also be erroneous.
Methods to Handle Duplicates
As the existence of duplicates in a database increase, various methods are introduced to handle them. They are listed below −
- Using Distinct Keyword
- Using Group By Clause
- Using Union Clause
Let us learn more about these methods in detail below.
Using Distinct Keyword
We can handle duplicates in SQL by using the DISTINCT keyword. This is used with the SELECT statement to eliminate all the duplicate records and by retrieving only the unique records.
Syntax
The basic syntax of a DISTINCT keyword to eliminate duplicate records is as follows.
SELECT DISTINCT column1, column2,.....columnN FROM table_name WHERE [condition]
Example
Consider the CUSTOMERS table having the following records.
+----+----------+-----+-----------+----------+ | ID | NAME | AGE | ADDRESS | SALARY | +----+----------+-----+-----------+----------+ | 1 | Ramesh | 32 | Ahmedabad | 2000.00 | | 2 | Khilan | 25 | Delhi | 1500.00 | | 3 | kaushik | 23 | Kota | 2000.00 | | 4 | Chaitali | 25 | Mumbai | 6500.00 | | 5 | Hardik | 27 | Bhopal | 8500.00 | | 6 | Komal | 22 | MP | 4500.00 | | 7 | Muffy | 24 | Indore | 10000.00 | +----+----------+-----+-----------+----------+
First, let us see how the following SELECT query returns duplicate salary records.
SQL> SELECT SALARY FROM CUSTOMERS ORDER BY SALARY;
This would produce the following result where the salary of 2000 is coming twice which is a duplicate record from the original table.
+----------+ | SALARY | +----------+ | 1500.00 | | 2000.00 | | 2000.00 | | 4500.00 | | 6500.00 | | 8500.00 | | 10000.00 | +----------+
Now, let us use the DISTINCT keyword with the above SELECT query and see the result.
SQL> SELECT DISTINCT SALARY FROM CUSTOMERS ORDER BY SALARY;
Output
This would produce the following result where we do not have any duplicate entry.
+----------+ | SALARY | +----------+ | 1500.00 | | 2000.00 | | 4500.00 | | 6500.00 | | 8500.00 | | 10000.00 | +----------+
Using Group By Clause
We can also merge two similar records into one using the Group By clause. Following is the syntax to do so −
SELECT column_name(s) FROM table_name GROUP BY column_name(s);
Example
In this example, we are trying to create a new table “Employee” using the query below −
CREATE TABLE EMPLOYEE ( EID INT NOT NULL, EMPLOYEE_NAME VARCHAR (30) NOT NULL, SALES_MADE DECIMAL (20) );
Now, we can insert values into this empty tables using the INSERT statement as follows −
INSERT INTO EMPLOYEE VALUES (102, 'SARIKA', 4500); INSERT INTO EMPLOYEE VALUES (100, 'ALEKHYA', 3623); INSERT INTO EMPLOYEE VALUES (101, 'REVATHI', 1291); INSERT INTO EMPLOYEE VALUES (103, 'VIVEK', 3426); INSERT INTO EMPLOYEE VALUES (100, 'ALEKHYA', 3623);
The Employee table consists of the details of employees in an organization and sales made by them.
+-----+---------------+------------+ | EID | EMPLOYEE_NAME | SALES_MADE | +-----+---------------+------------+ | 102 | SARIKA | 4500 | | 100 | ALEKHYA | 3623 | | 101 | REVATHI | 1291 | | 103 | VIVEK | 3426 | | 100 | ALEKHYA | 3623 | +-----+---------------+------------+
Using the following Group By query, we are trying to merge the duplicate records present in the table into one record and arranges them in ascending order.
SELECT * FROM EMPLOYEE GROUP BY EID, EMPLOYEE_NAME, SALARY;
Output
The table displayed is as follows −
+-----+---------------+------------+ | EID | EMPLOYEE_NAME | SALES_MADE | +-----+---------------+------------+ | 100 | ALEKHYA | 3623 | | 101 | REVATHI | 1291 | | 102 | SARIKA | 4500 | | 103 | VIVEK | 3426 | +-----+---------------+------------+
Using Union
UNION is a type of operator/clause in SQL, that works similar to the union operator in relational algebra. It does nothing more than just combining information from multiple tables that are union compatible.
Only distinct rows from the tables are added to the result table, as UNION automatically eliminates all the duplicate records.
Syntax
Following is the syntax of UNION operator in SQL −
SELECT * FROM table1 UNION SELECT * FROM table2;
Example
Let us first create two table “COURSES_PICKED” and “EXTRA_COURSES_PICKED” with the same number of columns having same data types.
Create table COURSES_PICKED using the following query −
CREATE TABLE COURSES_PICKED( STUDENT_ID INT NOT NULL, STUDENT_NAME VARCHAR(30) NOT NULL, COURSE_NAME VARCHAR(30) NOT NULL );
Insert values into the COURSES_PICKED table with the help of the query given below −
INSERT INTO COURSES_PICKED VALUES(1, 'JOHN', 'ENGLISH'); INSERT INTO COURSES_PICKED VALUES(2, 'ROBERT', 'COMPUTER SCIENCE'); INSERT INTO COURSES_PICKED VALUES(3, 'SASHA', 'COMMUNICATIONS'); INSERT INTO COURSES_PICKED VALUES(4, 'JULIAN', 'MATHEMATICS');
The table will be displayed as −
+------------+--------------+------------------+ | STUDENT_ID | STUDENT_NAME | COURSE_NAME | +------------+--------------+------------------+ | 1 | JOHN | ENGLISH | | 2 | ROBERT | COMPUTER SCIENCE | | 3 | SASHA | COMMUNICATIONS | | 4 | JULIAN | MATHEMATICS | +------------+--------------+------------------+
Create table EXTRA_COURSES_PICKED using the following query −
CREATE TABLE EXTRA_COURSES_PICKED( STUDENT_ID INT NOT NULL, STUDENT_NAME VARCHAR(30) NOT NULL, EXTRA_COURSE_NAME VARCHAR(30) NOT NULL );
Following is the query to insert values into the EXTRA_COURSES_PICKED table −
INSERT INTO EXTRA_COURSES_PICKED VALUES(1, 'JOHN', 'PHYSICAL EDUCATION'); INSERT INTO EXTRA_COURSES_PICKED VALUES(2, 'ROBERT', 'GYM'); INSERT INTO EXTRA_COURSES_PICKED VALUES(3, 'SASHA', 'FILM'); INSERT INTO EXTRA_COURSES_PICKED VALUES(4, 'JULIAN', 'MATHEMATICS');
The table will be created as shown below −
+------------+--------------+--------------------+ | STUDENT_ID | STUDENT_NAME | COURSES_PICKED | +------------+--------------+--------------------+ | 1 | JOHN | PHYSICAL EDUCATION | | 2 | ROBERT | GYM | | 3 | SASHA | FILM | | 4 | JULIAN | MATHEMATICS | +------------+--------------+--------------------+
Now, let us try to combine both these tables using the UNION query as follows −
SELECT * FROM COURSES_PICKED UNION SELECT * FROM EXTRA_COURSES_PICKED;
Output
The resultant table obtained after performing the UNION operation is −
+------------+--------------+--------------------+ | STUDENT_ID | STUDENT_NAME | COURSE_NAME | +------------+--------------+--------------------+ | 1 | JOHN | ENGLISH | | 1 | JOHN | PHYSICAL EDUCATION | | 2 | ROBERT | COMPUTER SCIENCE | | 2 | ROBERT | GYM | | 3 | SASHA | COMMUNICATIONS | | 3 | SASHA | FILM | | 4 | JULIAN | MATHEMATICS | +------------+--------------+--------------------+
Since the record of "Julian" is redundant, UNION clause eliminates the duplicate record and returns distinct values only.