Estuary

DynamoDB Stream to ElasticSearch (Integration Guide)

Learn about DynamoDB stream to ElasticSearch for enhanced real-time search and analysis. Explore code-based and no-code integration methods.

dynamodb to elasticsearch
Share this article

Most modern applications need more than simple key-value lookups. They need full-text search, filtering across multiple fields, faceted search, autocomplete, or near real-time analytics over operational data. DynamoDB is excellent for low-latency transactional workloads, but it is not designed to be a full-text search engine.

A DynamoDB to Elasticsearch pipeline solves this by keeping DynamoDB as the primary application database and using Elasticsearch as the dedicated search and analytics index. As items are inserted, updated, or deleted in DynamoDB, those changes can be streamed into Elasticsearch so users can search fresh application data quickly.

In this post, we’ll look at two reliable methods to stream data from DynamoDB to Elasticsearch: using Estuary and using AWS Lambda with DynamoDB Streams.

If you are indexing more than DynamoDB, see the ways to get data into Elasticsearch, with a method-by-method comparison and a decision table.

How to Stream Data From DynamoDB to Elasticsearch

There are two methods you can use to stream data from DynamoDB to Elasticsearch:

  1. Method 1: Using Estuary for Streaming DynamoDB to Elasticsearch
  2. Method 2: Using AWS Lambda for DynamoDB Stream to Elasticsearch
MethodBest forFreshnessComplexity
EstuaryManaged DynamoDB to Elasticsearch pipelines using DynamoDB StreamsReal-time or near real-timeLow
AWS LambdaTeams building custom AWS-native stream processingReal-time or near real-timeMedium to high

Method 1: Using Estuary for Streaming DynamoDB to Elasticsearch

Estuary can stream DynamoDB changes into Elasticsearch using DynamoDB Streams. Once streams are enabled on the DynamoDB tables you want to capture, Estuary continuously captures inserts, updates, and deletes into Estuary collections and then materializes those collections into Elasticsearch indices.

Prerequisites

  • An Estuary account.
  • One or more DynamoDB tables with DynamoDB Streams enabled.
  • AWS credentials with permission to discover and read the relevant DynamoDB tables and streams.
  • An Elasticsearch cluster with a known endpoint.
  • An Elasticsearch role with the required privileges for the target indices.
  • Network access between Estuary, DynamoDB, and Elasticsearch.

Estuary’s DynamoDB connector requires access to list tables in the AWS region. If you see an AccessDeniedException, check whether the IAM policy allows dynamodb:ListTables using the required table resource pattern.

Step 1: Configure DynamoDB as the Source

  • Login to your Estuary account.
  • Click on the Sources tab on the left navigation pane.
DynamoDB to ElasticSearch - Flow Dashboard
  • Click on the + NEW CAPTURE button.
DynamoDB to ElasticSearch - New Capture
  • Next, search for DynamoDB using the Search connectors field and click the connector’s Capture button to begin configuring it as the data source.
DynamoDB to ElasticSearch - DynamoDB Connector Search
  • On the Create Capture page, enter the specified details like NameAccess Key ID, Secret Access Key, and Region.
  • After filling in the required fields, click on NEXT > SAVE AND PUBLISH. This will capture data from DynamoDB into Estuary collections.
DynamoDB to ElasticSearch - Capture Details

Step 2: Configure Elasticsearch as the Destination

  • Once the source is set, click MATERIALIZE COLLECTIONS in the pop-up window or the Destinations option on the dashboard.
  • Click on the + NEW MATERIALIZATION button on the Destinations page.
DynamoDB to ElasticSearch - New Materialization
  • Type Elastic in the Search connectors box and click on the Materialization button of the connector when you see it in the search results.
DynamoDB to ElasticSearch - ElasticSearch materialization search
  • On the Create Materialization page, enter the details like NameEndpointUsername, Password, and Index Replicas.
  • If your collection of data from DynamoDB isn’t filled automatically, you can add it manually using the Link Capture button in the Source Collections section.
DynamoDB to ElasticSearch - materialization details
  • Finally, click on NEXT > SAVE AND PUBLISH to materialize data from your Flow collections to Elasticsearch.
  • With the source and destination configured, Estuary will begin loading data from the Flow collections to Elasticsearch.

The Elasticsearch user or API key used by Estuary should have the monitor cluster privilege and read, write, view_index_metadata, and create_index privileges for the target indices.

If you need deleted DynamoDB items to be removed from Elasticsearch search results, review the Elasticsearch connector’s delete behavior. Estuary tracks delete events with _meta/op set to d; depending on the destination configuration, you may want hard deletes instead of soft-delete markers.

Benefits of Using Estuary

Here are some of the benefits of Estuary.

  • No-code Configuration: Powerful no-code tools like Estuary are designed to be user-friendly and do not require extensive technical expertise to configure the source and destination. This is due to over 200 connectors that help simplify the process in just a few clicks. 
  • Real-time Data Processing With CDC: Estuary leverages Change Data Capture (CDC) for real-time data processing. This helps maintain data integrity and reduces latency.
  • Scalability: Estuary is designed to handle large data flows and supports up to 7 GB/s. This flow makes it highly scalable as data usage in DynamoDB and Elasticsearch increases. 
  • Efficient Data Transformations: Estuary supports TypeScript and SQL transformations. By leveraging Typescript, Estuary can prevent common pipeline failures and enable fully type-checked data pipelines, which is crucial for ensuring data integrity during migration. In addition, the platform’s native SQL transformations provide an easy-to-use alternative for reshaping, filtering, and rejoining data in real time, which is essential for maintaining data consistency and accuracy. 

Dynamodb to Elasticsearch

Method 2: Using AWS Lambda for DynamoDB Stream to Elasticsearch

Streaming data from DynamoDB to Elasticsearch can significantly enhance your application’s search capabilities. Here are the detailed steps involved in this method that uses AWS Lambda for the integration.

Step 1: Create Your DynamoDB Table With Streams Enabled

  • Create a DynamoDB table in the AWS Management Console.
  • Enable DynamoDB Streams on the table and set the stream view type to New Image.

Step 2: Create an IAM Role for Lambda Execution 

  • Your Lambda function needs permission to read from DynamoDB and write to your Elasticsearch domain.
  • Create an IAM Role with policies with permissions for Amazon Elasticsearch Service (ES), DynamoDB, and Lambda execution.

Here’s an example: 

json
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "es:ESHttpPost", "es:ESHttpPut", "dynamodb:DescribeStream", "dynamodb:GetRecords", "dynamodb:GetShardIterator", "dynamodb:ListStreams", "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": "*" } ] }

Step 3: Create an Elasticsearch Domain

  • In the AWS Management Console, create an Amazon OpenSearch Service domain. AWS renamed Amazon Elasticsearch Service to Amazon OpenSearch Service, though older AWS examples and existing domains may still use Elasticsearch terminology.

Note: AWS has transitioned Elasticsearch service to Amazon OpenSearch Service. However, existing domains continue to be referred to as Elasticsearch domains.

  • Configure the domain settings as needed, including access policies to allow the Lambda function to post data.

Step 4: Create a Lambda Function

  • Create a Lambda function by choosing a runtime (e.g., Python, Node.JS, etc.).
  • Write the function code to process records from DynamoDB streams and post them to Elasticsearch.

Here is a sample code in Python.

python
import boto3 import requests from requests_aws4auth import AWS4Auth region = 'us-east-1'  service = 'es' credentials = boto3.Session().get_credentials() awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token) host = 'https://search-ddb-to-es-r7dcdoy4caeoklst3yseumqmre.us-east-1.es.amazonaws.com' # the Amazon ES domain, with https:// index = 'lambda-index' type'lambda-type' url = host + '/' + index + '/_doc/' headers = { "Content-Type": "application/json" } def handler(event, context): count = 0 for record in event['Records']: # Get the primary key for use as the Elasticsearch ID id = record['dynamodb']['Keys']['id']['S'] if record['eventName'] == 'REMOVE': r = requests.delete(url + id, auth=awsauth) else: document = record['dynamodb']['NewImage'] r = requests.put(url + id, auth=awsauth, json=document, headers=headers) count += 1 return str(count) + ' records processed.'

Note: In production, convert DynamoDB AttributeValue objects into normal JSON before indexing, choose a stable document ID from the table key, and use the bulk API for batches instead of one request per record.

Step 5: Configure The DynamoDB Stream Trigger

  • In the Lambda function’s trigger, add a new trigger. 
  • Select DynamoDB as the trigger type and choose the DynamoDB table created in Step 1.

Step 6: Test the Setup

  • After the setup, make changes to your Amazon DynamoDB table and verify that the changes are reflected in your Elasticsearch domain.
  • You can use Kibana or Elasticsearch API to query or visualize the data and ensure it matches the changes made in DynamoDB. 

These are the steps for completing a DynamoDB stream to Elasticsearch using AWS Lambda. However, this method has several limitations.

  • DynamoDB Streams 24-hour processing Limit: DynamoDB stream retains data for 24 hours only. If the Lambda function fails to process records within this time frame, those records will be lost permanently.
  • Lambda Function Code and Dependencies: As your data streaming requirements evolve, you’ll need to update your Lambda function to handle schema changes, add error handling, etc., which can add extra operational overhead. 
  • Technical Expertise: Building the custom Lambda functions requires extensive knowledge in both programming and the AWS ecosystem, which can be a setback for non-technical users. 
  • No automatic historical backfill: DynamoDB Streams starts capturing changes after streams are enabled. If you need existing table data in Elasticsearch, you must run a separate backfill.
  • Batch failure handling: Lambda retries failed batches, so one bad record can block progress unless you configure partial batch response, retries, and a DLQ.
  • Mapping conflicts: DynamoDB’s flexible item structure can create Elasticsearch mapping conflicts if the same attribute appears with different types.
  • Indexing throughput: High-write DynamoDB tables may require batching, bulk indexing, concurrency tuning, and backpressure handling.

Key Takeaways

DynamoDB stream to Elasticsearch provides a significant increase in performance and scalability. While using the AWS Lambda function can help implement this, it can be time-consuming and requires extensive technical expertise, making it prone to errors.

Estuary is an excellent solution for those who want an easy and automated way to stream data from DynamoDB to Elasticsearch without the need for extensive technical knowledge. The method you choose depends on your needs and level of expertise.

If your Elasticsearch project also includes document databases, see our guide to syncing MongoDB to Elasticsearch.

Estuary provides an extensive and growing list of connectors, robust functionalities, and a user-friendly interface. Sign up today to simplify and automate DynamoDB stream to Elasticsearch.

FAQs

    Can you integrate Elasticsearch with DynamoDB?

    Yes. The most common pattern is to keep DynamoDB as the application database and stream DynamoDB item changes into Elasticsearch or Amazon OpenSearch Service for full-text search, filtering, and analytics. You can do this with a managed pipeline such as Estuary, with AWS Lambda and DynamoDB Streams, or with AWS-native OpenSearch ingestion options.
    No. DynamoDB Streams and Kinesis Data Streams are separate services. DynamoDB Streams captures item-level changes from a DynamoDB table, while Kinesis Data Streams is a general-purpose streaming service. They have similar APIs, and AWS provides a DynamoDB Streams Kinesis Adapter for some KCL-based processing patterns.

Start streaming your data for free

Build a Pipeline

About the author

Picture of Jeffrey Richman
Jeffrey RichmanData Engineering & Growth Specialist

Jeffrey is a data engineering professional with over 15 years of experience, helping early-stage data companies scale by combining technical expertise with growth-focused strategies. His writing shares practical insights on data systems and efficient scaling.

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.