Estuary

What is PostgreSQL DISTINCT Clause? + How To Use It

Learn how to use the PostgreSQL DISTINCT clause to easily eliminate duplicate records and fetch unique records -- with examples and syntax.

Share this article

There are several methods to select distinct values from query results in PostgreSQL. One of the ways is by using the DISTINCT keyword. You can use the PostgreSQL DISTINCT keyword in conjunction with the SELECT statement. It eliminates all duplicate records and fetches only unique records.

This article provides a brief explanation of PostgreSQL and its features. You will also gain information about the PostgreSQL DISTINCT clause and the different syntaxes. Finally, you’ll be presented with some examples of this PostgreSQL clause.

Before diving into the PostgreSQL DISTINCT clause, let’s get an overview of PostgreSQL.

What is PostgreSQL?

PostgreSQL, commonly known as Postgres, is an enterprise-class, open-source object-relational database management system (ORDBMS). It’s one of the most compliant, stable, and mature relational databases available today.

It was initially developed in 1986 as a follow-up to INGRES (an open-source SQL relational database project of the early 1970s). SQL support was added to the project in 1994, leading to PostgreSQL.

You can use PostgreSQL for both SQL (relational) and JSON (non-relational) queries. It also provides support for different SQL functions, like triggers, subqueries, foreign keys, and different user-defined functions.

The strong reputation of Postgres is owing to its reliability, extensibility, proven architecture, data integrity, and robust feature set. PostgreSQL is used as the primary database or data warehouse for many web, mobile, and analytics applications.

Postgres isn’t controlled by any corporation or private entity. Being an open-source software, you can access its source code under the PostgreSQL License.

Features of PostgreSQL

There are a few features of the PostgreSQL database that make it widely favored over other databases. Here are those features that have earned PostgreSQL a spot in the top-3 most popular databases in the world:

  • Point-in-time recovery (PITR): PostgreSQL logs every database change at all times with a write-ahead log. You can use PITR to restore databases to a specific moment in time when you run data recovery initiatives. It helps restore file systems to a stable starting point.
  • Stored procedures: PostgreSQL has built-in support for multiple procedural languages. You can create custom subroutines called stored procedures and call them on a given database.
  • Open source: PostgreSQL source code, under an open-source license, is freely available for you to use, modify, and implement as you see fit.
  • Extensions: PostgreSQL has foreign data wrappers that can fetch data from an external source or transmit data to it for update commands. Other robust features include granular access controls, asynchronous replication, and multi-version concurrency control.
  • Community support: PostgreSQL has a dedicated community of contributors and enthusiasts who regularly find bugs and fixes. They contribute to the overall system security and are readily available to support other users.
  • Standards compliance: PostgreSQL is ACID-compliant and runs on all major operating systems. It also tries to conform with the SQL standard. This means that most of the features required by the SQL standard are supported, though with slightly different syntax or function.
  • Dynamic Loading: You can include manually written code into the PostgreSQL server via dynamic loading. This can be done by specifying an object code file to implement a new type or function, and PostgreSQL will load it as required.

What is Postgres DISTINCT Clause?

In PostgreSQL, the DISTINCT clause is used to fetch only unique records. The use of the DISTINCT clause retains only one row from a set of duplicated rows.

You can use the DISTINCT clause on single or multiple columns. For multiple columns, it uses a combination of the targeted columns’ values to determine the duplicates. The clause skips only those records that have duplicates in the selected columns.

The DISTINCT clause must be used only with the SELECT statement.

Syntax

The syntax for the DISTINCT clause in PostgreSQL is:

plaintext
SELECT DISTINCT | DISTINCT ON (distinct_expressions) expressions FROM tables [WHERE conditions];

Parameters or Arguments

Here are what the different parameters or arguments in the syntax imply:

  • distinct_expressions: The expressions used to remove duplicates.
  • expressions: The columns to evaluate for unique values.
  • tables: The name of the tables to retrieve the records. You must specify at least one table with the FROM operator.
  • WHERE conditions: This is optional. It represents the conditions that must be satisfied for the selection of records.

Important Points

You must keep some things in mind when you use the DISTINCT clause syntax.

  • If you provide only one expression in the DISTINCT clause, the query returns the unique values for that expression.
  • If you provide more than one expression in the DISTINCT clause, the query retrieves unique combinations for the listed expressions.
  • The DISTINCT clause doesn’t ignore NULL values. When you use the clause in your SQL statement, NULL gets included as a distinct value in the result set.

How To Use the Postgres DISTINCT Clause?

Now that you know the Postgres DISTINCT clause syntax, here are a few different ways to use it.

Different Postgres with DISTINCT Syntaxes

Depending on the different values of distinct_expressions and the use of DISTINCT or DISTINCT_ON, there are three possible syntaxes.

Syntax #1

plaintext
SELECT DISTINCT column1 FROM table_name;

This syntax uses the DISTINCT PostgreSQL clause with a single expression. The DISTINCT clause evaluates the duplicate rows in column1 of the table_name. The output will only show the list of unique values in column1.

Syntax #2

plaintext
SELECT DISTINCT column1, column2 FROM table_name;

This syntax uses the DISTINCT Postgres clause with multiple expressions. The clause is applied to the multiple columns of the specified table. It checks the combination of the columns of the table for unique values. After evaluation, the Postgres DISTINCT clause displays the output for the combination of values in column1 and column2.

Syntax #3

plaintext
SELECT DISTINCT ON (column1) column_alias, column2 FROM table_name ORDER BY column1, column2;

When using the DISTINCT clause in PostgreSQL, you have one more option called DISTINCT ON. It is used with the SELECT statement for removing duplicates from the query set result.

The DISTINCT ON clause returns only the first row of each group of duplicates. The SELECT statement returns the rows in random order. So, the use of DISTINCT ON with the SELECT statement will return a random first row of each group of the duplicate.

For a more organized result set, use the ORDER BY clause with the DISTINCT ON clause. The first row returned by the DISTINCT ON will be based on the ORDER BY clause provided in the query. Also, any other fields listed within the SELECT statement will be returned for that first row.

Examples of Postgres DISTINCT Clause

To help you understand the use cases of the different syntaxes of the Postgres SELECT DISTINCT clause, let’s look into a few examples. Consider the following table titled colorproperties. It has two columns: fore_color and back_color.

postgresql distinct clause colorproperties

Image Source: Educba

Postgres DISTINCT with Single Column

The PSQL SELECT DISTINCT statement can be used to fetch the unique column values of fore_color from the colorproperties table. The following statement will help execute this. By using the ORDER BY clause, the values in the fore_color column will be sorted in alphabetical order.

plaintext
SELECT DISTINCT fore_color FROM ColorProperties ORDER BY fore_color
postgresql distinct clause single column

Image Source: Educba

Postgres DISTINCT with Multiple Columns

DISTINCT in PSQL can also be used for multiple columns. The following statement is used for PSQL Distinct multiple columns. Here, the DISTINCT clause is used for two columns i.e., fore_color and back_color. After evaluating the combination of the values in these two columns, it returns the unique values.

plaintext
SELECT DISTINCT fore_color, Back_color FROM ColorProperties ORDER BY fore_color, back_color;
postgresql distinct clause colorproperties

Image Source: Educba

Since the SELECT DISTINCT clause contains both fore_color and back_color, the values of both these columns are combined. The duplicate rows of the result set are then removed, and only the unique values of both columns are returned.

Postgres DISTINCT with DISTINCT ON

For a sorted result set after the removal of duplicate rows, here’s the syntax to use:

plaintext
SELECT DISTINCT ON (back_color) backgroundcolor, fore_color FROM ColorProperties ORDER BY back_color, fore_color;
postgresql distinct clause distinct on

Image Source: Educba

The result set is sorted by back_color and fore_color. Then, for each group of duplicates, only the first row is kept in the returned result set.

Postgres DISTINCT for Multiple Tables

You can use the DISTINCT PgSQL clause for multiple tables. Let’s consider an example of two tables named students and departments.

First, we use the CREATE TABLE statement to create the students table that consists of two columns: name and department_id.

plaintext
CREATE TABLE students ( name text not null, department_id text not null, CONSTRAINT students_pk PRIMARY KEY (name) );

Next, we insert some rows into the students table by using the INSERT statement.

plaintext
INSERT INTO students (name, department_id) VALUES (‘Jacob’, ‘101’), (‘David’, ‘102’), (‘John’,’103’);

Then, we create a table for departments with two columns: department_id and department_name.

plaintext
CREATE TABLE departments ( department_id text not null, department_name text not null );

Next, we insert values into the table.

plaintext
INSERT INTO departments (department_id, department_name) VALUES (‘101’, ‘Computer’), (‘102’, ‘Electrical’), (‘103’, ‘Mechanical’);

To query for unique values by joining the students and departments tables, here’s the statement to use:

plaintext
SELECT DISTINCT ON (s.department_id) s.department_id, s.name, d.department_name FROM students s JOIN departments d ON d.department_id = s.department_id ORDER BY s.department_id DESC

The (s.department_id) expression is defined for the DISTINCT ON clause to order the result set.

postgresql distinct clause multiple tables

Image Source: Educba

Conclusion

Whether you're working with a single column, multiple columns, or multiple tables, the Postgres DISTINCT clause, coupled with the SELECT statement, enables you to retrieve unique rows efficiently. For more intricate scenarios where you need a sorted result set, the DISTINCT ON clause, combined with the ORDER BY clause, offers an effective solution.

Additionally, for those engaging with Postgres in conjunction with other systems, there's an opportunity to enhance data movement. The Estuary Flow data integration and pipelining platform streamlines data transfer to and from PostgreSQL, ensuring efficiency and accuracy.

Moreover, in the realm of real-time data synchronization, the concept of Postgres Change Data Capture (CDC) emerges as a valuable technique. CDC enables the capture and tracking of database changes, facilitating seamless replication to other systems or enabling analytical insights. By identifying new, modified, or deleted records, CDC ensures data consistency across platforms, making it a vital tool for data-driven applications.

In summary, this article introduced PostgreSQL as a robust database system with features like the DISTINCT clause for querying unique data. It also touched upon the potential of Change Data Capture for real-time data synchronization. By leveraging these tools, you can bolster your data management capabilities and empower your applications with accurate, up-to-date information.

Work with Postgres and other systems? Consider incorporating the Estuary Flow platform to facilitate swift and efficient data movement between PostgreSQL and various destinations.

Furthermore, explore the realm of Change Data Capture for real-time synchronization, elevating the performance of your data-intensive applications.

Check out the blog for tutorials to connect Postgres to a variety of source and destination systems, including:

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Build a Pipeline

Start streaming your data for free

Build a Pipeline

About the author

Picture of Jeffrey Richman
Jeffrey Richman

With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.