
Moving data from Amazon S3 to Amazon Redshift is one of the most common workflows for analytics teams. S3 is often used as a low-cost landing zone for raw files, while Redshift is used to query, model, and analyze that data at scale.
There are three common ways to load data from S3 to Redshift:
- Use the Redshift
COPYcommand for fast bulk loading. - Use AWS Glue for managed ETL and transformation workflows.
- Use Estuary when you want an automated, low-maintenance pipeline with near real-time sync.
The right method depends on how often your data changes, how much transformation you need, and how much infrastructure you want to manage.
Quick Comparison: S3 to Redshift Methods
| Method | Best For | Pros | Limitations |
|---|---|---|---|
Redshift COPY command | Bulk loading files from S3 | Fast, native, cost-effective | Requires SQL, IAM setup, manual orchestration |
| AWS Glue | Batch ETL and transformation jobs | Serverless, AWS-native, good for transformations | More setup, often batch-oriented, can be complex |
| Estuary | Automated or real-time pipelines | Low-code, continuous sync, easier operations | Basic technical knowledge |
Before You Load Data from S3 to Redshift
Whichever method you pick, you will need:
- An S3 bucket containing your source files
- A Redshift cluster or Redshift Serverless workgroup, and a target database
- IAM permissions that let Redshift or Glue read your S3 bucket (IAM roles for COPY)
- Data files in a supported format such as CSV, JSON, Parquet, Avro, ORC, or delimited text
- A plan for schema changes, duplicate rows, failed loads, and refresh frequency
For the COPY and Glue methods you also create the destination table yourself. For Estuary, the connector creates the tables, which changes the prerequisites slightly (covered in Method 3).
Method #1: Using COPY Command to Load Data from S3 to Redshift
The COPY command is the fastest native way to bulk load files from S3 into Redshift. It is the right choice when your files are already prepared and you want a direct load into an existing table.
It fits teams that load large batches of already-clean files, are comfortable with SQL and IAM roles, and do not need real-time sync.
When to Use the COPY Command
Use COPY when:
- You need to load large batches of files from S3
- Your data is already clean or lightly transformed
- You are comfortable writing SQL
- You can manage IAM roles and load orchestration
- You do not need real-time synchronization
Step 1: Create or Identify the S3 Bucket
Upload your files to an S3 bucket. Organize them by source, table, and date when possible.
Example structure:
plaintexts3://company-data/events/year=2026/month=06/day=08/This makes it easier to manage incremental loads and troubleshoot failed files.
Step 2: Create the Redshift Target Table
Before running COPY, create the destination table in Redshift.
Example:
sqlCREATE TABLE public.events (
event_id VARCHAR(255),
user_id VARCHAR(255),
event_type VARCHAR(100),
event_time TIMESTAMP,
properties SUPER
);
Make sure the Redshift schema matches the structure of your S3 files.
Step 3: Configure IAM Permissions
Redshift needs permission to read objects from your S3 bucket. The recommended approach is to attach an IAM role to your Redshift cluster or workgroup and grant that role the required S3 permissions.
At minimum, the role usually needs permission to list the bucket and read the objects being loaded.
Step 4: Run the COPY Command
Example for JSON data:
sqlCOPY public.events
FROM 's3://company-data/events/'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
FORMAT AS JSON 'auto';Example for CSV data:
sqlCOPY public.customers
FROM 's3://company-data/customers/customers.csv'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
CSV
IGNOREHEADER 1;
Watch out for: duplicate and failed loads
This is where COPY bites teams in production. COPY does not deduplicate. If a load fails partway and you re-run it against the same S3 prefix, you can load the same rows twice and silently double your counts. The recovery is manual: truncate and reload, or load into a staging table and MERGE into the target. Two habits prevent it. Load through a staging table and MERGE on a key rather than appending directly, and use a COPY manifest so the command loads exactly the files you intend, not whatever happens to be in the prefix.
A single malformed row also aborts the entire batch by default. For large loads, set an explicit error tolerance and inspect the load-error system tables so one bad file does not block the rest.
COPY is native, fast, and cheap, but you own the orchestration, scheduling, error handling, and monitoring for every recurring load.
Method #2: Load Data from S3 to Redshift Using AWS Glue
AWS Glue is a serverless data integration service. It can discover data, catalog metadata, transform files, and write data into Redshift.
AWS Glue is a better fit than a plain COPY command when you need transformations, schema handling, or managed ETL jobs.
Important Note About Glue Crawlers
A Glue crawler does not load S3 data into Redshift. A crawler scans your data and creates or updates metadata tables in the Glue Data Catalog. To actually move data into Redshift you still need a Glue ETL job or Glue Studio workflow. Teams routinely lose a day to this misunderstanding, so plan for the ETL job from the start.
When to Use AWS Glue
Use AWS Glue when:
- You need to transform data before loading it
- You want serverless ETL inside AWS
- You need scheduled batch jobs
- You already use the Glue Data Catalog
- You need to read from multiple AWS data sources
Step 1: Prepare the S3 Source Data
Store your files in S3 and use a consistent structure. For example:
plaintexts3://company-data/orders/
s3://company-data/customers/
s3://company-data/events/Use partitioned folders if your data is time-based.
Step 2: Create an IAM Role for AWS Glue
Create an IAM role that allows Glue to:
- Read from the S3 source bucket
- Write temporary files if needed
- Connect to Redshift
- Access any required secrets or credentials
Step 3: Use a Glue Crawler to Catalog S3 Data
Create a Glue crawler and point it to the S3 path. The crawler detects the schema and creates metadata tables in the Glue Data Catalog.
This is useful for understanding file structure, but it is not the same as loading the data into Redshift.
Step 4: Create a Glue ETL Job
Create an AWS Glue job that reads from the S3 source or Glue Catalog table, applies any transformations, and writes the data to Redshift.
Typical transformations include:
- Renaming columns
- Casting data types
- Filtering records
- Flattening nested JSON
- Handling null values
- Deduplicating records
Step 5: Schedule and Monitor the Job
You can run the job manually or schedule it. Monitor job runs, failures, and data quality checks to ensure the Redshift tables are updated correctly.
AWS Glue Pros
- Serverless ETL
- Good for transformations
- Integrates with S3, Redshift, and the Glue Data Catalog
- Useful for scheduled batch pipelines
AWS Glue Cons
- More complex than the COPY command
- Debugging jobs can take time
- Not always the best fit for low-latency pipelines
- Requires AWS Glue, IAM, Redshift, and S3 configuration
Method #3: Load Data from S3 to Redshift Using Estuary
Estuary is a data integration platform that can help automate data movement from S3 to Redshift without requiring teams to manually manage COPY commands, Glue jobs, or orchestration scripts.
This method is useful when you want a managed pipeline that can keep Redshift updated as source data changes.
When to Use Estuary
Use Estuary when:
- You want an easier setup than custom AWS jobs
- You need automated S3 to Redshift pipelines
- You want to reduce pipeline maintenance
- You need near real-time or continuous data movement
- You prefer a low-code data integration workflow
Here's the following steps to use Estuary for loading data from S3 to Redshift:
Prerequisite: Set up Redshift to connect with Estuary. Learn how here.
Step 1: Setup Estuary Account
Step 2: Source Data Capture Setup
- Once you’ve logged in, locate the Captures option on the left side of the Estuary dashboard. On the Captures page, click on the New Capture button. This step is to set up the source end of the data pipeline.
- Search for S3 in Source Connectors.
- Then, click on the Capture button. You’ll be directed to the S3 Capture connector page.
- Fill in all the required fields, then click on Next. Then click on Save and Publish. This will capture your data in S3 into a Estuary collection.
Step 3: Destination Configuration
- The next step is to set up the destination end of the pipeline. In this case, it’s Amazon Redshift. Click on Materializations on the left side of the Estuary dashboard. On the Materializations page, click on New Materialization.
- Search for Amazon Redshift in Search Connectors, then click on the Materialization button.
- You’ll be redirected to the Redshift connector page, where you must fill in the required fields like Address, database user name, and Password. You will also have to provide the details of Amazon S3 buckets, along with the access and secret access key for read/write permission.
- Once you’ve filled in the required information, click on Next, then click on Save and Publish.
This will set up the automated data transfer process from S3 to Redshift. For more information on the process, here’s all the documentation to help out:
- How to Set Up a Data Pipeline like this one
- The Amazon S3 Source Connector
- The Amazon Redshift Materialization Connector
Estuary Pros
- Easier setup than building custom Glue jobs
- Supports automated data movement
- Reduces operational overhead
- Good fit for teams that want managed pipelines
- Useful when data needs to stay fresh in Redshift
Estuary Cons
- Less customizable than fully custom AWS code
- Requires connector configuration and platform onboarding
Which S3 to Redshift Method Should You Choose?
Choose the COPY command if you need a fast, low-cost way to bulk load files and your team is comfortable with SQL and AWS permissions.
Choose AWS Glue if you need to transform data before loading it, run scheduled ETL jobs, or manage data discovery through the Glue Data Catalog.
Choose Estuary if you want an easier, managed way to automate S3 to Redshift data pipelines and reduce the time spent maintaining ingestion jobs.
Best Practices for Loading S3 Data into Redshift
Use Columnar File Formats When Possible
For analytics workloads, columnar formats like Parquet are often more efficient than raw CSV because they reduce scan costs and improve query performance.
Compress Large Files
Compression can reduce storage and transfer costs. It can also improve load performance depending on file size and format.
Keep File Sizes Reasonable
Avoid loading thousands of tiny files if possible. Very small files can create overhead. Very large files can be harder to retry when failures happen.
Use IAM Roles Instead of Static Credentials
Use IAM roles for Redshift and Glue access to S3. Avoid embedding long-lived AWS access keys in SQL scripts or application code.
Monitor Load Errors
For COPY jobs, inspect load errors and failed rows. For Glue jobs, monitor job logs, retries, and failed transformations.
Plan for Schema Changes
S3 files can change over time. Decide how your pipeline should handle new columns, missing fields, type changes, and nested data.
Build Idempotent Loads
Design pipelines so rerunning a job does not create duplicate records. Use staging tables, merge logic, or deduplication keys where appropriate.
Conclusion
For fast, cost-effective bulk loads, the Redshift COPY command is hard to beat. For serverless ETL with transformations, AWS Glue fits. For an automated pipeline that creates tables, stages through S3, and keeps Redshift current with less manual work, Estuary is the simpler path.
To set up an automated S3-to-Redshift pipeline, the next step is to create an S3 capture and a Redshift materialization, which takes minutes. Start for free, or talk to our team about your setup.
FAQs
With Estuary, do I create the Redshift table or does the connector?
Can I load nested JSON from S3 into Redshift?
Should I use batch or real-time loading for S3 to Redshift?

About the author
Jeffrey is a data engineering professional with over 15 years of experience, helping early-stage data companies scale by combining technical expertise with growth-focused strategies. His writing shares practical insights on data systems and efficient scaling.










