
Most "S3 to Postgres" guides skip the method half their readers came for. If you are on Amazon RDS or Aurora, Postgres can read a file straight out of S3 with a single function call, no scripts and no extra service. If you are on self-hosted Postgres, you cannot use that, and you need a different path. And if you want Postgres to keep loading new files as they land in S3, none of the one-shot methods will do.
So before the steps, decide which situation you are in. That choice, not the tooling, is what makes this easy or painful.
Quick Comparison: Loading S3 Data into PostgreSQL
| Method | Best for | Freshness | Setup effort | Keeps loading new files? |
|---|---|---|---|---|
aws_s3extension | RDS or Aurora Postgres, one-time or scripted CSV loads | On demand | Low (if you are on RDS/Aurora) | No, you re-run it |
Python + \\copy | Self-hosted Postgres, custom parsing or transforms | On demand or scheduled | Medium, you own the code | Only if you build and schedule it |
| AWS DMS | AWS-native teams doing a periodic or one-time migration | Minutes to hours | High | Yes, with ongoing tasks |
| Estuary | Continuous, hands-off loading as files arrive | Near real-time | Low | Yes, automatically |
Short version: on RDS or Aurora for a one-time CSV load, use aws_s3 and stop reading. On self-hosted Postgres, script it with Python and \\copy. For continuous loading without owning code, use a pipeline tool.
Why move S3 data into Postgres at all
S3 is cheap, durable object storage, and it is where raw files tend to land: exports, logs, third-party drops, backups. Postgres is where you actually query and join that data. Moving files from S3 into Postgres tables turns a pile of objects into something you can run SQL against, and it gives you a relational copy if the S3 version is ever lost or overwritten.
Method 1: The Native aws_s3 Extension (RDS and Aurora)
If your Postgres runs on Amazon RDS or Aurora, this is the method to reach for first, and most articles do not mention it. The aws_s3 extension lets Postgres pull a file directly from S3 into a table without any external script or service.
One thing to be clear about up front: this only works on RDS and Aurora PostgreSQL. It is not available on self-hosted Postgres, because the extension and the AWS integration are part of the managed service. If you self-host, skip to Method 2.
First, install the extension. CASCADE pulls in the aws_commons dependency it relies on:
sqlCREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;
Next, give the database permission to read your bucket. Attach an IAM role to your RDS instance or Aurora cluster with read access to the bucket, and associate that role with the instance's S3 import feature. Using a role rather than keys means the import call carries no credentials.
Then create the target table and import the file:
sqlCREATE TABLE customers (
id INT,
name VARCHAR(255),
signup_date TIMESTAMP
);
SELECT aws_s3.table_import_from_s3(
'customers', -- target table
'', -- column list, empty means all columns
'(format csv, header true)', -- file options
aws_commons.create_s3_uri('my-bucket', 'customers.csv', 'us-east-1')
);
A few things experience teaches you here. The extension reads CSV, text, and gzip, not Parquet, so convert Parquet first if that is your format. The S3 region in create_s3_uri has to match the bucket's actual region or the call fails. And the import does not transform anything, so the table has to match the file. If a column will not cast cleanly, the whole import errors out rather than loading partial rows.
This is the right answer for a one-time or occasionally repeated CSV load on managed Postgres. It is the wrong answer if you need continuous loading, since you have to re-run the function every time new files show up.
Method 2: Python and psql for Self-Hosted Postgres
On self-hosted Postgres, you write a little code. The pattern is download from S3, then stream the file into Postgres with COPY. It is more work to maintain, but you get full control over parsing and transforms.
Pull the file with boto3:
pythonimport boto3
s3 = boto3.client("s3")
s3.download_file("my-bucket", "customers.csv", "/tmp/customers.csv")
Then load it. The fastest, most reliable load is COPY, and psycopg2's copy_expert lets you stream the file straight in:
pythonimport psycopg2
conn = psycopg2.connect(
host="your-host", dbname="your-db", user="your-user", password="your-password"
)
with conn.cursor() as cur, open("/tmp/customers.csv") as f:
cur.copy_expert("COPY customers FROM STDIN WITH (FORMAT csv, HEADER true)", f)
conn.commit()
conn.close()
If you are at a psql prompt instead of writing Python, the client-side equivalent is one line:
plaintext\\copy customers FROM '/tmp/customers.csv' WITH (FORMAT csv, HEADER true)
Use \\copy, not server-side COPY ... FROM '/path', unless the file already sits on the database server and your role has pg_read_server_files. That distinction trips up a lot of people: server-side COPY reads a path on the Postgres host, while \\copyreads a file on your machine.
For anything recurring, you wrap this in a script and schedule it with cron or an orchestrator. That is also where this method starts to cost you, because now you own retries, logging, schema changes, and deduplication.
Method 3: AWS DMS
AWS Database Migration Service is the managed option if you live in AWS and need a periodic or one-time move. DMS reads from an S3 source (it needs an external table definition as a JSON file describing the data), stages through its replication instance, and writes into your Postgres target.
It is more setup than the first two methods and is built around batch tasks rather than instant loads, so it fits migrations and scheduled syncs more than low-latency needs. AWS supports Postgres as a DMS target across a range of versions, and that list moves as new releases ship, so check the current AWS DMS PostgreSQL documentation rather than trusting a fixed list. The migration user has to be a registered Postgres user with the right privileges.
Method 4: Estuary for Continuous, No-Code Loading
The three methods above all share a limitation: they load a file and stop. If new files keep landing in S3 and you want them in Postgres without re-running anything, you need something that watches the bucket. That is the problem a pipeline tool solves.
Here are the steps involved in using Flow for replicating your S3 data to Postgres.
Step 1: Capture S3 Data
To use Flow for replicating data from S3 to Postgres, you’ll need an Estuary account. If you don’t already have an account, register for a free account. And if you have one, log in to your account.
After you’ve logged in, you’ll be redirected to the Estuary dashboard. Click on Captures and then click on the New Capture button.
In the Search Connectors box, search for S3.
Click on the Capture button of the Amazon S3 connector option. This will redirect you to the Create Capture page of the S3 connector. You can use the connector to capture data from an S3 bucket, provided it is publicly accessible and allows anonymous reads or the S3 bucket must be accessible via a root or IAM user. Fill in the required fields like a Name for the connector, the AWS Region location of the S3 bucket, and the name of the S3 bucket.
After providing the details, click on the Next button. Flow connects to your S3 bucket and detects all the data sources. Then, click on Save and Publish.
Step 2: Set up Postgres as the Destination
Before you continue, make sure your Postgres instance is ready to receive data from Flow. See the prerequisites.
You can set up Postgres as your data destination by clicking on Materialize Connections in the pop-up that follows after a successful capture. Alternatively, you can navigate to the Estuary dashboard and click on Materializations. Then, click on the New Materialization button.
Search for PostgreSQL in the Search Connectors box. You’ll see the PostgreSQL connector option in the search results. Click on the Materialization button to proceed with setting up your data destination.
On the Postgres materialization connector page, you must fill in the required details; apart from a Name for the connector, Endpoint Config details like Username, Password, and Address of the database must be provided.
Now, click on Next.
Use the Collection Selector to choose the data collections captured from S3, if not automatically selected.
Then, click on Save and Publish. The connector will create new tables in your database based on your specifications.
If you’d like more information on the process, here’s the Estuary documentation to help out:
Where this is not the right call: a single one-time load of one file does not justify standing up a pipeline. Use Method 1 or 2 for that. Estuary earns its place when files keep coming and you do not want to babysit a script.
Common S3-to-Postgres Errors and How to Fix Them
These are the failures that actually eat your afternoon, across all four methods.
- The import aborts on one bad row. Both
aws_s3andCOPYare all-or-nothing by default. A single value that will not cast to the column type fails the whole load. Load into a staging table with forgiving types (text columns), then cast and insert into the real table once you have inspected the bad rows. - Encoding mismatches. Files exported from Windows tools are often not UTF-8, and Postgres rejects invalid byte sequences. Set the client encoding to match the file, for example
WITH (FORMAT csv, ENCODING 'LATIN1'), or convert the file to UTF-8 before loading. - Timestamps and numbers that will not parse. Date formats and thousands separators are the usual culprits. Stage as text, then cast with explicit format functions like
to_timestampso you control the parsing instead of hoping the default works. - The IAM role is missing or wrong (Method 1). If
table_import_from_s3cannot reach the bucket, the role is usually not attached to the instance's S3 import feature, or its policy does not cover the bucket. The region increate_s3_urinot matching the bucket causes the same symptom. - Partial loads when files are still arriving. If a process writes to S3 while you load, you can pick up an incomplete file. Load from a prefix that only contains finished files, or have the writer move objects into a "ready" prefix once complete.
- Memory blowups in Python. Reading a large CSV into pandas before loading will exhaust memory on big files. Stream with
copy_expertstraight from the file handle, as in Method 2, instead of loading the whole thing into a dataframe first.
Which Method Should You Use?
On RDS or Aurora and you need a CSV in a table now, the aws_s3 extension is the least work and has the fewest moving parts. On self-hosted Postgres, the Python and \\copy approach is the standard path, and it is fine until the maintenance adds up. For a managed migration inside AWS, DMS fits. And when new files keep landing and you want them in Postgres automatically without owning a script, a pipeline tool like Estuary is the low-effort way to get continuous loading.
Conclusion
The method that fits depends on where your Postgres runs and whether the loading is one-time or continuous. RDS and Aurora users have the easiest path with aws_s3, self-hosted users script it, AWS migrations suit DMS, and continuous file loading is where a no-code pipeline pays off.
If continuous, hands-off loading is what you are after, you can try Estuary for free or talk to the team about your setup.

About the author
Jeffrey is a data engineering professional with over 15 years of experience, helping early-stage data companies scale by combining technical expertise with growth-focused strategies. His writing shares practical insights on data systems and efficient scaling.










