Estuary

MongoDB to Elasticsearch: 4 Methods That Actually Work

Sync MongoDB to Elasticsearch using the native Elastic connector, Monstache, Logstash, or Estuary for real-time CDC. Step-by-step guide covering all current working methods.

mongodb to elasticsearch - blog hero
Share this article

Introduction

MongoDB and Elasticsearch are built for different things. MongoDB stores documents and handles transactional reads and writes at scale. Elasticsearch indexes those documents and makes them searchable across millions of records in milliseconds. Using them together is a common and well-established pattern: MongoDB as the system of record, Elasticsearch as the search and analytics layer on top of it.

The challenge is keeping them in sync. Every time a document is created, updated, or deleted in MongoDB, that change needs to reach Elasticsearch quickly enough to keep search results accurate. For a product catalog, a user profile system, or a content platform, stale search results directly affect user experience. For a fraud detection or monitoring system, stale data affects business outcomes.

If you have searched for how to do this and found articles recommending the MongoDB River plugin or Mongo Connector, ignore them. The River plugin was removed from Elasticsearch in version 1.5 in 2015 and has not worked for a decade. Mongo Connector was deprecated by MongoDB in 2020 and the repository is now archived. Both are completely non-functional. Any guide still recommending these tools is outdated and will waste your time.

This guide covers the four methods that actually work in 2026, what each one is suited for, and how to set each one up correctly.

What this guide covers:

  • The native Elastic MongoDB Connector for straightforward self-managed sync
  • Monstache for real-time open source sync using MongoDB change streams
  • Logstash with the MongoDB input plugin for teams already using the Elastic stack
  • Estuary for fully managed real-time CDC with no infrastructure to operate
  • How MongoDB BSON documents map to Elasticsearch index mappings
  • Troubleshooting the most common sync failures

Why Syncing MongoDB to Elasticsearch is Harder Than It Looks

On the surface, moving data from MongoDB to Elasticsearch seems straightforward. Both store JSON-like documents, both are schema-flexible, and both have mature client libraries in every major language. The reality is more complicated.

  • The data model mismatch is significant. MongoDB documents can contain deeply nested objects, arrays of mixed types, and fields that change type between documents in the same collection. Elasticsearch requires an explicit or inferred index mapping where every field has a defined type. When a nested MongoDB document arrives in Elasticsearch without a correctly configured mapping, Elasticsearch either rejects it, flattens it incorrectly, or maps it as a generic object that cannot be queried efficiently.
  • Deletes do not propagate automatically. When a document is deleted in MongoDB, nothing in Elasticsearch knows about it unless your sync mechanism explicitly handles delete events. Batch sync tools that only read current documents will never catch deletes, leaving stale documents in your Elasticsearch index indefinitely. For search use cases this means deleted products, deactivated users, or removed content continue appearing in search results.
  • Change streams require a replica set. The most reliable real-time sync methods for MongoDB use change streams, which require MongoDB to be running as a replica set or on MongoDB Atlas. A standalone MongoDB instance does not support change streams. If you are running a standalone instance and need real-time sync, you either need to convert it to a replica set first or use a batch polling approach with its inherent latency tradeoffs.
  • Index mappings conflict with schema evolution. MongoDB collections have no enforced schema, so new fields appear in documents over time as the application evolves. In Elasticsearch, adding a new field type that conflicts with an existing mapping causes indexing failures. Without a strategy for handling schema evolution, a sync pipeline that works today breaks silently when a developer adds a new field to a MongoDB document next month.

Understanding these four problems upfront helps you choose the right sync method and configure it correctly the first time.

Choosing the Right Sync Method

The right method depends on three things: whether your MongoDB instance supports change streams, how much infrastructure you want to operate, and how fresh your Elasticsearch data needs to be.

  1. Native Elastic MongoDB Connector is the simplest option for teams that want an officially supported tool without writing code. It handles both initial sync and ongoing updates. Best for teams already on Elastic Cloud or running a managed Elastic deployment.
  2. Monstache is the most widely used open source option for real-time sync. It reads from MongoDB change streams, handles deletes correctly, and supports index mapping configuration. Best for teams that want real-time sync, are comfortable operating a lightweight Go daemon, and want full control over the sync behavior.
  3. Logstash with the MongoDB input plugin suits teams already running the ELK stack who want to add MongoDB as a data source. It is a batch polling approach rather than true real-time sync, which makes it appropriate for use cases where data freshness of a few minutes is acceptable.
  4. Estuary is the fully managed option. It uses MongoDB change streams to capture inserts, updates, and deletes in real time and materializes them into Elasticsearch indices continuously. No infrastructure to provision, no daemon to operate, and no custom code to write or maintain.
 Native Elastic ConnectorMonstacheLogstashEstuary
Real-time syncYesYesNo (batch)Yes
Handles deletesYesYesNoYes
Requires replica setYesYesNoYes
Infrastructure to operateModerateLowModerateNone
Supports MongoDB AtlasYesYesYesYes
Supports self-hosted MongoDBYesYesYesYes
Supports Amazon DocumentDBNoPartialYesYes
Supports Azure Cosmos DBNoNoYesYes
Open sourceYesYesYesNo
Setup complexityLowModerateModerateLow

Method 1 — Native Elastic MongoDB Connector

The native Elastic MongoDB Connector is an officially supported connector maintained by Elastic. It handles both the initial full sync and ongoing incremental updates, and it runs either as a managed connector on Elastic Cloud or as a self-hosted connector on your own infrastructure.

Prerequisites:

  • MongoDB 3.6 or later running as a replica set, or MongoDB Atlas
  • Elasticsearch 8.5.1 or later, or an Elastic Cloud deployment
  • A MongoDB user with read permission on the databases you want to sync
  • Network access between the connector and both MongoDB and Elasticsearch

Setting up the connector on Elastic Cloud:

Go to your Elastic Cloud deployment and navigate to Search, then Connectors. Click New Connector and select MongoDB from the connector catalog.

Fill in the connection details:

  • MongoDB connection string in the format mongodb+srv://username:password@cluster.mongodb.net for Atlas, or mongodb://username:password@hostname:27017 for self-hosted
  • Database name to sync from
  • Collection name, or leave blank to sync all collections in the database

Click Save and run a full sync to load existing documents into Elasticsearch. Once the full sync completes, enable incremental sync to capture ongoing changes using MongoDB change streams.

Elastic creates an index in Elasticsearch for each MongoDB collection, named using the format database-name.collection-name by default.

Setting up the connector as self-hosted:

If you are not on Elastic Cloud, you can run the connector as a service on your own infrastructure using the Elastic connector framework:

plaintext language-bash
# Install the connector service pip install elastic-connectors # Configure and run elastic-ingest --config config.yml

Your config.yml needs the Elasticsearch endpoint, API key, and MongoDB connection details. The full configuration reference is in Elastic's connector documentation.

Limitations to be aware of:

The native connector does not support MongoDB views or time series collections, only standard collections. It also does not support Amazon DocumentDB or Azure Cosmos DB, even though both use MongoDB-compatible APIs. For those environments use Estuary or Logstash instead.

Incremental sync frequency on the self-hosted version depends on your polling configuration. On Elastic Cloud the managed connector runs incremental syncs on a schedule you define, typically every few minutes rather than true real-time sub-second latency.

Method 2 — Monstache

Monstache is an open source sync daemon written in Go that reads from MongoDB change streams and indexes documents into Elasticsearch in real time. It is actively maintained, widely used in production, and handles the four core sync problems correctly: real-time updates, deletes, schema mapping, and resume after failure.

Unlike the native Elastic connector which runs on a sync schedule, Monstache listens to the MongoDB change stream continuously and pushes changes to Elasticsearch within milliseconds of them occurring in MongoDB.

Prerequisites:

  • MongoDB 3.6 or later running as a replica set or on MongoDB Atlas
  • Elasticsearch 7.x or 8.x
  • Go 1.20 or later if building from source, or use the pre-built binary

Installation:

Download the latest release binary from the Monstache GitHub releases page:

plaintext language-bash
# Linux wget <https://github.com/rwynn/monstache/releases/download/v6.7.12/monstache-linux-amd64.zip> unzip monstache-linux-amd64.zip chmod +x monstache

Or build from source:

plaintext language-bash
go install github.com/rwynn/monstache@latest

Basic configuration:

Monstache uses a TOML configuration file. A minimal configuration that syncs a single MongoDB collection to Elasticsearch looks like this:

plaintext language-toml
mongo-url = "mongodb://username:password@hostname:27017" elasticsearch-urls = ["<https://your-elasticsearch-host:9200>"] elasticsearch-user = "elastic" elasticsearch-password = "your-password" # Sync specific namespaces (database.collection format) namespace-regex = '^mydb\\.products$' # Enable change stream mode (default, requires replica set) change-stream-namespaces = ["mydb.products"]

Run Monstache with your config file:

plaintext language-bash
./monstache -f config.toml

Handling deletes:

By default Monstache propagates deletes from MongoDB to Elasticsearch automatically. When a document is deleted in MongoDB, Monstache receives the delete event from the change stream and issues a delete request to Elasticsearch for the corresponding document. No additional configuration is required.

Configuring index naming:

By default Monstache creates Elasticsearch indices using the MongoDB namespace format database.collection. You can override this with a custom index mapping:

plaintext language-toml
[[mapping]] namespace = "mydb.products" index = "products"

Handling resume after failure:

Monstache stores its resume token from the MongoDB change stream in Elasticsearch by default. If Monstache restarts after a failure, it reads the resume token and continues from where it left off without missing any changes. This makes it safe to restart without data loss as long as the MongoDB change stream window has not expired.

Configure the resume token storage location explicitly:

plaintext language-toml
resume = true resume-name = "monstache-resume"

Running Monstache in production:

For production deployments, run Monstache as a system service using systemd to ensure it restarts automatically after failures:

plaintext language-bash
# Create a systemd service file at /etc/systemd/system/monstache.service [Unit] Description=Monstache MongoDB to Elasticsearch sync After=network.target [Service] ExecStart=/usr/local/bin/monstache -f /etc/monstache/config.toml Restart=always RestartSec=5 [Install] WantedBy=multi-user.target

Enable and start the service:

plaintext language-bash
systemctl enable monstache systemctl start monstache

Limitations:

Monstache requires a MongoDB replica set. It does not work with standalone MongoDB instances. It also does not officially support Amazon DocumentDB, which implements only a subset of the MongoDB change stream API. For DocumentDB use Estuary's dedicated DocumentDB connector instead.

Method 3 — Logstash with MongoDB Input Plugin

Logstash is part of the Elastic stack and the natural choice for teams already running ELK infrastructure who want to add MongoDB as a data source. It polls MongoDB on a defined schedule rather than using change streams, making it a batch sync approach rather than real-time. For use cases where data freshness of a few minutes is acceptable, it works reliably without requiring a MongoDB replica set.

Prerequisites:

  • Logstash 7.x or 8.x
  • MongoDB 3.6 or later, standalone or replica set
  • Elasticsearch 7.x or 8.x
  • The logstash-input-mongodb plugin

Install the MongoDB input plugin:

plaintext language-bash
bin/logstash-plugin install logstash-input-mongodb

Verify the installation:

plaintext language-bash
bin/logstash-plugin list | grep mongodb

Basic pipeline configuration:

Logstash pipelines are defined in .conf files with three sections: input, filter, and output. Here is a complete configuration that polls a MongoDB collection every 5 minutes and indexes documents into Elasticsearch:

ruby
input { mongodb { uri => "mongodb://username:password@hostname:27017/mydb" placeholder_db_dir => "/tmp/logstash-mongodb" placeholder_db_name => "logstash_sqlite.db" collection => "products" batch_size => 5000 generateId => true since_table => "products_since" schedule => "*/5 * * * *" } } filter { mutate { rename => { "_id" => "[@metadata][mongo_id]" } } } output { elasticsearch { hosts => ["<https://your-elasticsearch-host:9200>"] user => "elastic" password => "your-password" index => "products" document_id => "%{[@metadata][mongo_id]}" } }

The key configuration fields are:

  • placeholder_db_dir and placeholder_db_name define where Logstash stores its position tracking using a local SQLite database. This is how Logstash knows which documents it has already processed on subsequent runs.
  • schedule uses cron syntax to define how often Logstash polls MongoDB. /5 * * * * runs every 5 minutes.
  • batch_size controls how many documents Logstash reads per poll. Increase this for larger collections but monitor memory usage.
  • generateId set to true tells the plugin to use the MongoDB _id field as the Logstash document ID, which maps correctly to the Elasticsearch document ID and prevents duplicates on re-runs.

Handling nested MongoDB documents:

MongoDB documents with nested objects need a filter step to flatten them before indexing into Elasticsearch, otherwise they land as opaque objects that cannot be queried by field:

ruby
filter { ruby { code => " event.to_hash.each do |key, value| if value.is_a?(Hash) value.each do |subkey, subvalue| event.set("#{key}_#{subkey}", subvalue) end event.remove(key) end end " } }

This flattens a nested document like address.city into a top-level field address_city that Elasticsearch can index and query directly.

Running the pipeline:

plaintext language-bash
bin/logstash -f /etc/logstash/conf.d/mongodb-to-elasticsearch.conf

For production, run Logstash as a managed service and place your pipeline config in /etc/logstash/conf.d/. Logstash reads all .conf files in that directory on startup.

Limitations:

Logstash with the MongoDB input plugin does not handle deletes. It only reads documents that exist in MongoDB at the time of each poll. Documents deleted from MongoDB remain in Elasticsearch until you manually remove them or rebuild the index. If your use case requires delete propagation, use Monstache or Estuary instead.

The polling approach also means changes that occur between poll intervals are not immediately visible in Elasticsearch. For most analytics and reporting use cases this is acceptable. For search experiences where users expect immediate results after a change, the latency is a meaningful limitation.

Method 4 — Sync MongoDB to Elasticsearch Using Estuary

Estuary is a managed data pipeline platform that connects MongoDB to Elasticsearch using change streams, capturing every insert, update, and delete in real time and materializing them into Elasticsearch indices continuously. Unlike Monstache or Logstash, there is no daemon to deploy, no infrastructure to operate, and no custom configuration to maintain when MongoDB schemas evolve.

Estuary's approach fits the right-time data model: you get sub-second latency when your use case demands it, or you configure a less frequent materialization cadence for workloads where that makes more sense, all within the same pipeline without changing tools.

Prerequisites:

  • MongoDB 3.6 or later running as a replica set, MongoDB Atlas, Amazon DocumentDB, or Azure Cosmos DB
  • An Elasticsearch cluster with a known endpoint, self-hosted or on Elastic Cloud
  • An Estuary account at dashboard.estuary.dev
  • Estuary IP addresses allowlisted in MongoDB and Elasticsearch if your instances restrict inbound connections

Supported MongoDB platforms:

This is where Estuary has a meaningful advantage over Monstache and the native Elastic connector. Estuary supports:

  • Self-hosted MongoDB replica sets
  • MongoDB Atlas
  • Amazon DocumentDB via a dedicated connector variant
  • Azure Cosmos DB for MongoDB via a dedicated connector variant

If you are running DocumentDB or Cosmos DB, Estuary is the only method in this guide that supports them natively.

Step 1: Create an Estuary account

Go to dashboard.estuary.dev and register for a free account using Google or GitHub credentials.

Step 2: Create a MongoDB capture

Click Sources in the left navigation, then New Capture. Search for MongoDB in the connector catalog and select the appropriate variant for your environment: MongoDB, Amazon DocumentDB, or Azure Cosmos DB.

mongodb to elasticsearch - mongodb

Fill in the connection details:

  • Address: Your MongoDB connection string. For Atlas use the mongodb+srv:// format. For self-hosted use mongodb://hostname:27017. If you are connecting with a user that has access to all databases, append ?authSource=admin to authenticate through the admin database.
  • User and Password: Credentials for a MongoDB user with read access on the databases you want to capture.
  • Database: The specific database to capture from.
mongodb to elasticsearch - mongodb source configuration

Click Next. Estuary connects to MongoDB and auto-detects all collections available for capture.

Step 3: Select capture mode per collection

This is a detail that matters for correctness. Estuary supports three capture modes, configured per collection:

  • Change Stream Incremental is the default and preferred mode. It uses MongoDB change streams to capture inserts, updates, and deletes in real time as they occur. This requires a replica set or Atlas.
  • Batch Snapshot performs a full collection scan on a defined schedule. Use this for MongoDB views or time series collections that do not support change streams, or for standalone MongoDB instances without replica set support.
  • Batch Incremental scans only documents where the cursor field value is higher than the last observed value. Use this for append-only collections. For best performance, ensure the cursor field has an index in MongoDB.

For most production use cases, Change Stream Incremental is the right choice. Select your collections and click Save and Publish. Estuary immediately begins a backfill of existing documents and simultaneously listens for new change stream events so no changes are missed during the initial load.

Step 4: Create an Elasticsearch materialization

Click Destinations, then New Materialization. Search for Elasticsearch in the connector catalog and select it.

mongodb to elasticsearch - elasticsearch destination materialization

Fill in the Elasticsearch connection details:

  • Endpoint: Your Elasticsearch cluster URL, for example https://your-cluster.es.io:9243 for Elastic Cloud or https://hostname:9200 for self-hosted
  • Username and Password: Credentials for an Elasticsearch user with the following index privileges on the target indices: auto_configure, create_index, delete, delete_index, index, read
mongodb to elasticsearch 0 elasticsearch destination configuration

Step 5: Configure index bindings

Estuary maps each MongoDB collection to an Elasticsearch index. By default, the index name matches the collection name. You can rename target indices at this stage.

For each binding, you can also select specific fields to include or exclude before data reaches Elasticsearch, which is useful for dropping internal MongoDB fields or sensitive data you do not want indexed.

Step 6: Save and publish

Click Save and Publish. Estuary begins materializing MongoDB documents into Elasticsearch indices immediately. Within minutes of publishing, you should see documents appearing in your Elasticsearch indices.

Verify the sync is working by querying Elasticsearch directly:

plaintext language-bash
curl -X GET "<https://your-elasticsearch-host:9200/your-index/_count>" \\ -u elastic:your-password

The document count should match your MongoDB collection count once the initial backfill completes.

To verify real-time sync, insert a document in MongoDB and query Elasticsearch for it within seconds:

plaintext language-bash
curl -X GET "<https://your-elasticsearch-host:9200/your-index/_search>" \\ -u elastic:your-password \\ -H "Content-Type: application/json" \\ -d '{"query": {"match": {"field_name": "test_value"}}}'

Why Estuary handles the hard parts automatically:

The four sync problems described in Section 2 are all handled without manual configuration. Deletes are propagated automatically via change stream delete events. Schema evolution in MongoDB is detected and the Elasticsearch index mapping is updated without pipeline intervention. Resume after failure uses the change stream resume token so no changes are lost during outages. And the backfill plus real-time sync run in parallel on first setup so you never have a gap between historical data and live changes.

Connect MongoDB to Elasticsearch

MongoDB to Elasticsearch Data Mapping

Getting the pipeline running is only half the work. The other half is ensuring MongoDB documents map correctly to Elasticsearch index mappings so your data is queryable, searchable, and structured the way your application expects. This is where most sync pipelines fail silently after initial setup.

How Elasticsearch Index Mappings Work

When a document arrives in Elasticsearch for the first time, Elasticsearch either uses an existing mapping you have defined or infers one automatically through dynamic mapping. Dynamic mapping is convenient but produces incorrect field types surprisingly often when the source is MongoDB, because BSON types do not map cleanly to Elasticsearch field types in every case.

The safest approach for a MongoDB to Elasticsearch pipeline is to define your index mapping explicitly before the first document arrives, then disable dynamic mapping for fields you do not control. This prevents Elasticsearch from making incorrect type inferences that are painful to fix after data is already indexed.

Common MongoDB to Elasticsearch Type Mappings

MongoDB BSON TypeElasticsearch TypeNotes
Stringtext or keywordUse text for full-text search fields. Use keyword for exact match, filtering, and aggregations. Use both with a multi-field mapping for fields that need both.
Int32, Int64integer, longMap directly. Use long for MongoDB Int64 to avoid overflow.
DoubledoubleMaps directly.
BooleanbooleanMaps directly.
DatedateMongoDB stores dates as BSON Date. Elasticsearch expects ISO 8601 format. Verify your sync tool converts correctly.
ObjectIdkeywordMongoDB _id ObjectId fields should be mapped as keyword in Elasticsearch since they are used for exact match lookups not full-text search.
Nested objectobject or nestedUse object for simple nested documents. Use nested when you need to query sub-fields independently. See below.
ArrayArray of the element typeElasticsearch handles arrays natively for most types. Arrays of objects require nested mapping.
NullAny typeElasticsearch accepts null for any field type. No special handling needed.
Decimal128scaled_float or keywordNo native Elasticsearch equivalent. Use scaled_float for numeric operations or keyword if you only need to store and retrieve the value.

The Object vs Nested Mapping Problem

This is the most common mapping mistake in MongoDB to Elasticsearch pipelines and the one most likely to produce silently incorrect query results.

MongoDB documents frequently contain arrays of embedded objects. For example an orders collection might have a line_items field containing an array of objects each with product_id, quantity, and price fields.

If you map line_items as an object type in Elasticsearch, which is the default, Elasticsearch flattens the array internally. This means a query for orders where line_items.product_id is ABC and line_items.quantity is greater than 5 may return incorrect results because Elasticsearch correlates field values across different array elements rather than within the same element.

If you map line_items as nested, Elasticsearch preserves the relationship between fields within each array element and queries produce correct results.

Define nested mappings explicitly before indexing:

plaintext language-json
PUT /orders { "mappings": { "properties": { "line_items": { "type": "nested", "properties": { "product_id": { "type": "keyword" }, "quantity": { "type": "integer" }, "price": { "type": "double" } } } } } }

Nested mappings have a performance cost. Elasticsearch stores nested objects as hidden separate documents internally. For collections with large arrays of nested objects, monitor index size and query performance after enabling nested mapping.

Text vs Keyword for String Fields

MongoDB stores all strings as BSON strings without distinguishing between fields used for search and fields used for filtering. Elasticsearch makes this distinction explicitly and it affects both indexing behavior and query performance significantly.

  • Use text mapping for fields your users search against: product names, descriptions, article content, user bios. Text fields are analyzed, meaning Elasticsearch tokenizes and normalizes the content to support full-text search including stemming and relevance scoring.
  • Use keyword mapping for fields you filter, sort, or aggregate on: status codes, category names, user IDs, country codes. Keyword fields are stored as-is and support exact match queries and aggregations efficiently.

For fields that need both, use a multi-field mapping:

plaintext language-json
"product_name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }

This lets you run full-text search queries on product_name and exact match or aggregation queries on product_name.keyword from the same indexed field.

Handling MongoDB ObjectId Fields

The MongoDB _id field is an ObjectId by default, a 12-byte BSON type that looks like 507f1f77bcf86cd799439011 when serialized to a string. Most sync tools convert ObjectId to its string representation when writing to Elasticsearch.

Map _id and any other ObjectId reference fields as keyword in Elasticsearch:

plaintext language-json
"_id": { "type": "keyword" }, "user_id": { "type": "keyword" }, "product_id": { "type": "keyword" }

Do not map ObjectId fields as text. Text mapping applies full-text analysis which tokenizes the ObjectId string and makes exact match lookups unreliable.

Managing Schema Evolution

MongoDB collections have no enforced schema, so new fields appear over time as your application evolves. Elasticsearch dynamic mapping handles new fields automatically by default, but this can produce incorrect type inferences for new fields.

The safest production configuration uses dynamic mapping templates to control how new fields are handled:

plaintext language-json
PUT /products { "mappings": { "dynamic_templates": [ { "strings_as_keywords": { "match_mapping_type": "string", "mapping": { "type": "keyword" } } } ], "properties": { "description": { "type": "text" }, "product_name": { "type": "text", "fields": { "keyword": { "type": "keyword" } } } } } }

This template maps any new string fields that arrive without an explicit mapping as keyword by default, which is a safer default than text for unknown fields. Fields you know need full-text search are defined explicitly in the properties section.

Troubleshooting Common MongoDB to Elasticsearch Sync Problems

Problem 1: Change Streams Not Available

Error:

plaintext
Failed to watch change stream: not primary or secondary; cannot replicate

or

plaintext
The $changeStream stage is only supported on replica sets

Cause: Your MongoDB instance is running as a standalone server rather than a replica set. Change streams require a replica set or MongoDB Atlas.

Fix: Convert your standalone MongoDB instance to a single-node replica set. This does not require additional servers and works on a single machine:

plaintext language-bash
# Start mongod with replica set enabled mongod --replSet rs0 --bind_ip localhost # Connect to mongosh and initiate the replica set rs.initiate({ _id: "rs0", members: [{ _id: 0, host: "localhost:27017" }] })

After initiating, verify the replica set status:

plaintext language-bash
rs.status()

The output should show the node as PRIMARY. Change streams are now available and your sync tool can connect normally.

If converting to a replica set is not possible in your environment, switch to Logstash with the MongoDB input plugin which uses polling and does not require a replica set, or use Estuary's Batch Snapshot capture mode.

Problem 2: Documents Rejected Due to Mapping Conflicts

Error in Elasticsearch logs:

plaintext
mapper_parsing_exception: failed to parse field [field_name] of type [integer]

or

plaintext
illegal_argument_exception: mapper [field_name] cannot be changed from type [keyword] to [text]

Cause: A document arrived in Elasticsearch with a field value that conflicts with the existing index mapping. This happens most often when a MongoDB collection contains documents where the same field holds different types across documents, for example some documents have price as an integer and others have it as a string.

Fix: First identify all documents in the affected MongoDB collection where the field type is inconsistent:

plaintext language-jsx
db.products.find({ price: { $type: "string" } }).limit(10)

Then decide on the correct type and either clean the MongoDB data or handle the type conversion in your sync configuration. For Monstache, use a JavaScript mapping function to normalize the field before it reaches Elasticsearch:

plaintext language-jsx
// In monstache config [[script]] namespace = "mydb.products" script = """ module.exports = function(doc) { if (typeof doc.price === 'string') { doc.price = parseFloat(doc.price); } return doc; } """

For existing indices with a mapping conflict you cannot resolve through data cleaning, the only option is to delete the index, fix the mapping, and re-index:

plaintext language-bash
# Delete the conflicting index curl -X DELETE "<https://your-elasticsearch-host:9200/products>" \\ -u elastic:your-password # Create with correct mapping then trigger a full re-sync

Problem 3: Deletes Not Propagating to Elasticsearch

Symptom: Documents deleted from MongoDB continue appearing in Elasticsearch search results.

Cause: Your sync method does not handle delete events. Logstash with the MongoDB input plugin does not capture deletes. Batch polling tools in general only read documents that currently exist and have no mechanism for detecting removals.

Fix options:

If you are using Logstash, the only reliable fix is to implement a soft delete pattern in MongoDB. Instead of deleting documents, add a deleted_at timestamp field and filter deleted documents in your application queries. In Logstash, sync the deleted_at field to Elasticsearch and handle filtering there. This avoids true deletes entirely.

If soft deletes are not acceptable, switch to Monstache or Estuary, both of which receive delete events from MongoDB change streams and propagate them to Elasticsearch automatically.

Problem 4: Resume Token Expired After Downtime

Error with Monstache:

plaintext
resume token not found in change stream

or

plaintext
ChangeStreamHistoryLost: Resume of change stream was not possible

Cause: Monstache stores a resume token from the MongoDB change stream to recover its position after a restart. MongoDB change streams have a retention window tied to the oplog size. If Monstache is down long enough for the oplog to rotate past the stored resume token position, the token becomes invalid and Monstache cannot resume.

Fix: Increase the MongoDB oplog size to give your sync tool more time to recover after outages. On a replica set:

plaintext language-jsx
// Check current oplog size db.getReplicationInfo() // Increase oplog size to 10GB db.adminCommand({ replSetResizeOplog: 1, size: 10240 })

For immediate recovery when the token has already expired, delete the stored resume token and trigger a full re-sync. In Monstache this means clearing the resume record from Elasticsearch:

plaintext language-bash
curl -X DELETE "<https://your-elasticsearch-host:9200/monstache-resume>" \\ -u elastic:your-password

Then restart Monstache. It will perform a full backfill of all MongoDB documents before resuming change stream capture.

For Estuary, resume token management is handled automatically. If a resume token expires during an extended outage, Estuary detects this and automatically triggers a re-backfill of the affected collections without manual intervention.

Problem 5: Elasticsearch Indexing Performance Degrading at Scale

Symptom: Sync latency increases over time, Elasticsearch CPU and memory usage climbs, and indexing throughput drops as the MongoDB collection grows.

Common causes and fixes:

Too many shards is the most frequent cause of Elasticsearch performance problems at scale. The default of one shard per index works well for small indices but becomes inefficient as data grows. For indices expected to exceed 50GB, configure sharding explicitly when creating the index:

plaintext language-json
PUT /products { "settings": { "number_of_shards": 3, "number_of_replicas": 1 } }

Refresh interval set too low forces Elasticsearch to make new documents searchable too frequently. The default refresh interval of 1 second is aggressive for high-volume indexing. During bulk indexing operations increase it temporarily:

plaintext language-bash
curl -X PUT "<https://your-elasticsearch-host:9200/products/_settings>" \\ -u elastic:your-password \\ -H "Content-Type: application/json" \\ -d '{"index": {"refresh_interval": "30s"}}'

Reset to the default after bulk indexing completes.

Nested mappings on large arrays add significant indexing overhead because Elasticsearch stores each nested object as a hidden document. If you have nested mappings on fields with large arrays, evaluate whether object mapping with the cross-element query limitation is acceptable for your use case. The performance difference can be substantial at scale.

Conclusion

Syncing MongoDB to Elasticsearch is a solved problem in 2026, but only if you are using tools that are still actively maintained. The MongoDB River plugin and Mongo Connector are both dead. Any guide still recommending them is outdated.

Here is a quick reference for choosing the right method for your situation:

SituationRecommended Method
Already on Elastic Cloud, want simplest setupNative Elastic MongoDB Connector
Need real-time sync, comfortable operating a daemonMonstache
Already running ELK stack, batch latency acceptableLogstash with MongoDB input plugin
Need fully managed pipeline, no infrastructure to operateEstuary
Running Amazon DocumentDBEstuary
Running Azure Cosmos DB for MongoDBEstuary
Standalone MongoDB without replica setLogstash or Estuary Batch Snapshot mode
Need sub-second sync with delete propagationMonstache or Estuary

Whichever method you choose, the data mapping section of this guide is worth revisiting after your pipeline is running. Getting the connection working is straightforward. Getting the Elasticsearch index mappings right so your data is correctly queryable is where most pipelines need tuning.

Start Syncing MongoDB to Elasticsearch in Real Time

Estuary connects to your MongoDB instance using change streams, captures every insert, update, and delete as it happens, and keeps your Elasticsearch indices in sync continuously. It works across self-hosted MongoDB, MongoDB Atlas, Amazon DocumentDB, and Azure Cosmos DB without separate tools or custom configuration for each environment.

No daemon to deploy. No infrastructure to operate. No resume token management. No mapping conflicts to debug manually.

Setup takes under 30 minutes. Your first pipeline is free.

Start building at dashboard.estuary.dev.


Further Reading:

FAQs

    Does MongoDB need to be running as a replica set for real-time sync?

    Yes, for any method that uses change streams including Monstache and Estuary, MongoDB must be running as a replica set or on MongoDB Atlas. A standalone MongoDB instance does not support change streams. If converting to a replica set is not possible, use Logstash which uses batch polling and works on standalone instances, or use Estuary's Batch Snapshot capture mode.
    Monstache and Estuary both propagate deletes correctly because they read from MongoDB change streams which include delete events. The native Elastic connector also handles deletes. Logstash with the MongoDB input plugin does not handle deletes because it only reads documents that currently exist at poll time. If delete propagation is a requirement, Logstash is not the right choice for your use case.
    Yes. Both the native Elastic MongoDB Connector and Estuary are no-code options that handle the full sync through a UI-based setup without writing scripts or configuration files. Estuary additionally handles schema evolution, delete propagation, and resume after failure automatically, making it the lower-maintenance option of the two for ongoing production pipelines.
    Yes. Estuary has dedicated connector variants for both Amazon DocumentDB and Azure Cosmos DB for MongoDB. This is a meaningful difference from Monstache and the native Elastic connector, neither of which officially supports these platforms. If you are running either managed database service and need a reliable sync to Elasticsearch, Estuary is the most straightforward option currently available.

Start streaming your data for free

Build a Pipeline
Share this article
Summarize this page with AI

Table of Contents

Start Building For Free

About the author

Picture of Jeffrey Richman
Jeffrey Richman

With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.

Related Articles

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.