
Large-scale businesses generate vast amounts of data daily. For over 50 years, Oracle has been a trusted database many organizations use to store and manage their enterprise data. However, organizations need to find ways to use their data beyond its original setup to derive timely insights and make strategic decisions. This is where Change Data Capture (CDC) comes in.
CDC lets you capture and transfer real-time data changes from Oracle to other platforms, ensuring up-to-date and accessible information. Apache Kafka, a popular event streaming platform, plays a key role in providing reliable and continuous data movement in this CDC process. With Kafka, you can effectively process, store, and distribute Oracle data to various destinations for advanced analytics and business intelligence.
This article explores two effective methods to help your organization with real-time data streaming from Oracle to Apache Kafka. Before getting started, let’s understand the capabilities of both platforms. If you’re ready to set up the streaming workflow, skip to the Oracle to Kafka integration approaches.
A Brief Overview of Oracle
Oracle is a powerful cloud-based relational database management system (RDBMS) developed by Oracle Corporation. It is built to store, manage, and process large volumes of business-critical data. With its robust architecture and enterprise-grade features, Oracle has become the most popular RDBMS across industries; it supports applications requiring scalability, high availability, and fault tolerance.
One of Oracle's notable features is Automatic Storage Management (ASM). It is an integrated, high-performance database file system and disk manager that allows you to streamline storage management in Oracle databases. Instead of requiring manual administration of potentially thousands of database files, ASM automates storage allocation and optimization. This is done by organizing disks into different disk groups, ensuring efficient data distribution.
Oracle utilizes multi-version concurrency control (MVCC) to improve database performance further. It supports more than one user in accessing the database simultaneously without conflicts, ensuring data consistency in a multiuser environment. Oracle’s advanced indexing techniques, like B-tree, bitmap, and function-based indexes, also enhance efficiency by accelerating data retrieval and optimizing query processing.
A Brief Overview of Kafka
Kafka, or Apache Kafka, is a distributed streaming platform initially developed by LinkedIn and later open-sourced under the Apache Software Foundation. It allows you to publish, store, process, and subscribe to streams of records in real-time. This makes Kafka ideal for building real-time data pipelines and event-driven applications.
Fundamentally, Kafka follows a publish-subscribe model for building a messaging system or queue. In this approach, you interact with a Kafka cluster of multiple brokers that efficiently manage real-time data with minimal latency. Brokers serve as the backbone of Kafka in storing and delivering data through topics. These topics are logical channels where producers (publishers) send messages to Kafka topics, and consumers (subscribers) read messages from these topics instantly.
Some key features of Kafka include:
- Exactly-Once Support: Kafka supports exactly-once delivery in Kafka Streams using transactional producers and consumers; you can transfer and process data between Kafka topics without duplication or data loss. If you need exactly once-delivery for other producers and consumer systems, you can use Kafka Connect’s automatic offset management to simplify the process.
- Offsite Replication: To enhance disaster recovery, Kafka lets you replicate data between geographically distributed Kafka clusters using tools like Kafka MirrorMaker 2.0. This tool is designed to mirror data across Apache Kafka clusters in real-time.
- Scalability: Kafka supports horizontal scalability through its partitioning mechanism, which helps you split the topics into separate partitions. Each partition represents an ordered, immutable sequence of messages. When more partitions are added to a Kafka topic, multiple consumers can read data from the same topic, distributing workload efficiently.
- High Throughput: By utilizing Kafka, you can handle millions of messages per second with minimal latency, making it suitable for high-speed data streaming. This is achieved using zero-copy technology, which allows direct data transfer from the disk to the network buffer. By bypassing unnecessary memory copies, Kafka reduces I/O overhead, optimizes resource utilization, and increases the message processing rates.
How to Stream Data from Oracle to Kafka (2 Methods)
For effective Oracle to Kafka streaming, you can use one of the following methods:
- The Easy Way: Automate Oracle to Kafka integration using Estuary Flow.
- The Manual Method: Manually move data from Oracle to Kafka using Kafka Connect JDBC Connector.
Method 1: Automate Oracle to Kafka Integration Using Estuary Flow
Estuary Flow is a real-time data integration tool that enables you to build ETL, ELT, CDC, batch, and streaming pipelines in minutes. With Estuary’s user-friendly interface, you can capture data from various databases, apply transformations using streaming SQL, and deliver the data to your chosen destination.
As you capture data, Flow automatically organizes each stream into a reusable collection with unlimited storage. This durable, append-only transaction log is stored securely in your private account. Due to this, you can enforce custom security rules and encryption for complete control over your data.
Here are some of Estuary Flow’s popular features:
- Change Data Capture (CDC): You can perform incremental data loads using Estuary Flow’s end-to-end streaming CDC feature. It allows you to capture changes made to the source and replicate them to a destination. The entire real-time data transfer process executes with a sub-100 millisecond latency.
- No-Code Connectors: By leveraging its 200+ pre-built connectors, Estuary Flow lets you streamline data integration workflows. With these connectors, you can effortlessly connect multiple sources and destinations without writing code, reducing the need for manual pipeline configuration.
- Multi-Cloud Deployment: Estuary Flow provides three deployment modes—Public Deployment, Private Deployment, and BYOC (Bring Your Own Cloud) for secure, scalable, and personalized data integration requirements. The Public option is fully managed, while the Private version allows you to configure Estuary’s data infrastructure in your private environment. The BYOC plan offers additional control and security, enabling you to deploy the platform in your cloud environment.
Let’s see how to stream data from Oracle to Kafka via Estuary Flow. Before you begin, make sure the following prerequisites are in place:
- An active Estuary Flow account.
For Oracle:
- Create a dedicated Estuary Flow user with read-only access for the tables to replicate.
- Install Oracle 11g version or higher.
- Enable connections from Estuary Flow to your Oracle database if they are in separate virtual private clouds.
If you are unable to utilize Logminer with your Oracle instance, you may instead want to consider Estuary’s batch Oracle connector.
For Kafka:
- A Kafka cluster with an authentication mechanism, bootstrap.servers configured, and connection security enabled via TLS.
- For the Avro message format, you will also need a schema registry, with username and password for authentication.
If you ultimately only want to send your data to a Kafka consumer, consider using an Estuary Dekaf connector instead. This would simplify setup by removing the requirement of maintaining your own Kafka broker and schema registry.
Step 1: Configure Oracle as Your Source
- Sign in to your Estuary Flow account.
- Select the Sources option from the left navigation pane of the Estuary home page.
- From the Sources page, click the + NEW CAPTURE button and search for Oracle using the Search connectors field.
- In the search results, choose the Oracle Database (Real-time, first party) connector and click its Capture button.
- On the Oracle database connector configuration page, you must specify a unique source capture name in the Name field within the Capture Details section.
- Expand the Endpoint Config section and enter the necessary information in the Server Address, User, Password, and Database fields.
- You can also configure advanced options like Backfill Chunk Size, Incremental Chunk Size, and Incremental SCN Range.
- After providing relevant details, click the NEXT > SAVE AND PUBLISH.
Upon completion of configuration, the connector will capture the data from the Oracle database into a Flow collection.
Step 2: Configure Apache Kafka as Your Destination
Once your Oracle data capture is complete, a dialog box will show up with the capture details. You can click the MATERIALIZE COLLECTIONS option in this dialog box to start configuring the destination connector.
An alternative way is to navigate to the Estuary Flow home page and click Destinations > + NEW MATERIALIZATION.
- On the Create Materialization page, type Kafka in the Search connectors box.
- You will see the Apache Kafka Real-time connector in the search results. Click the connector’s Materialization button.
- On the Create Materialization page, specify a unique materialization name in the Name field.
- Expand the Endpoint Config section and enter the required details like Bootstrap Servers, SASL credentials, and Schema Registry information.
- Typically, your Oracle capture data should be automatically added to your Kafka materialization. If not, you can manually link it by clicking the SOURCE FROM CAPTURE button in the Source Collections section and selecting the desired capture.
- Click NEXT > SAVE AND PUBLISH to complete your Kafka materialization configuration.
The Apache Kafka connector will materialize your Oracle data Flow collections into Kafka topics, completing the Oracle to Kafka integration.
Benefits of Using Estuary Flow
Some key benefits of utilizing Estuary Flow during Oracle to Kafka streaming:
Simplified Redo Log Management
Oracle redo log files assist in recording all database changes to achieve durability and recovery. However, using redo logs for Oracle CDC to Kafka has several limitations.
As redo logs grow continuously, they require high storage. Without proper retention policies, excessive disk usage becomes an issue, and the CDC still needs to maintain a sufficient history of this. Besides this, redo logs store low-level binary data, requiring tools like Oracle LogMiner to decode changes and extract meaningful information.
How Estuary Flow Helps:
With Estuary’s Oracle source connector, you can simplify the redo file management. The connector uses Oracle LogMiner, enabling you to analyze, track, and retrieve database changes from redo log files effortlessly. You can then convert the captured changes into Flow collections and publish them to Kafka topics. This guarantees a streamlined CDC process without the complexities of manual log handling.
Efficient SCN-Based CDC Handling
SCN or System Change Number is a unique identifier assigned to each committed transaction in Oracle. These SCNs help maintain consistent and ordered replication in Oracle CDC workflows.
However, managing SCN-based CDC is challenging because capturing too many changes in a single iteration can overload memory and processing resources. In contrast, a small SCN range may slow down replication and increase latency.
How Estuary Flow Helps:
In Estuary Flow, the Oracle source connector retrieves changes within an SCN range of 50,000 by default. If your Oracle database processes a high volume of events or has low activity, you can adjust the SCN range using the advanced.incremental_scn_range option. By automatically managing SCN-based CDC, Estuary Flow assures that Oracle database changes are delivered to Kafka in real-time with minimal latency.
Method 2: Manually Move Data from Oracle to Kafka Using Kafka Connect JDBC Connector
Kafka Connect is a data integration framework that facilitates data streaming between Apache Kafka and various external systems like Oracle.
To simplify the Oracle to Kafka data streaming process, you can use platforms like Confluent. It offers connectors for external systems, including Oracle Database, Postgres, and RabbitMQ.
The Kafka Connect approach uses snapshots of existing data in the Oracle database and tracks all the row-level changes made to the original data. All the source data updates are recorded in a Kafka topic, which the consumer application can refer to perform event-driven operations.
Before using the Oracle source connector in Confluent, ensure that you satisfy the following prerequisites:
- You must install and configure Confluent CLI for the Kafka cluster.
- The Oracle database version must be 11.2.0.4 or later and configured with a Pluggable Database (PDB) service name.
- To use Schema registry-based formats, like Avro, Protobuf, and JSON_SR, you must enable Schema Registry.
- Access to Kafka cluster credentials for authentication.
Step 1: List All the Available Connectors
To check all the available connectors offered by Confluent, open Confluent CLI and run the following command:
confluent connect plugin list
Step 2: Display the Connector Configuration Properties
List the Oracle database source connector’s configuration properties:
confluent connect plugin describe <OracleDatabaseSource>
The above code will respond with the properties required to configure the connector.
Step 3: Define the Connector Configuration Properties
In this step, you can create a JSON file, oracle_source.json, using all the connector configuration properties. Here’s an example:
javascript{
"name": "OracleDatabaseSource_0",
"connector.class": "OracleDatabaseSource",
"kafka.auth.mode": "KAFKA_API_KEY",
"kafka.api.key": "<my-kafka-api-key>",
"kafka.api.secret": "<my-kafka-api-secret>",
"topic.prefix": "oracle_",
"connection.host": "<my-database-endpoint>",
"connection.port": "1521",
"connection.user": "<database-username>",
"connection.password": "<database-password>",
"db.name": "db078_pdb1.subnet.vcn.oraclevcn.com",
"table.whitelist": "PASSENGERS",
"timestamp.column.name": "created_at",
"output.data.format": "JSON",
"db.timezone": "UTC",
"tasks.max": "1"
}
You must replace the placeholders with proper access credentials.
- name: Provide a name for the connector.
- connector.class: Identifies the JDBC source connector for Oracle.
- kafka.auth.mode: For the specification of authentication mode, which can be SERVICE_ACCOUNT or KAFKA_API_KEY. You can mention the configuration properties kafka.api.key and kafka.api.secret for API key authentication. To use the service account, specify the Resource ID in the kafka.service.account.id property.
- topic.prefix: Provide a topic prefix. It defines the topic prefix using the <topic.prefix><tableName> naming convention. The tables have the properties configured to topic.creation.default.partitions=1 and topic.creation.default.replication.factor=3. To create custom properties, you must create topics with specific settings before running this connector.
- output.data.format: Specify the Kafka data format, which can be Avro, JSON_SR, PROTOBUF, or JSON as valid entries.
- db.timezone: Mention the database timezone. The default value of this field is set to UTC.
Step 4: Create the Connector Using the Properties File
To load the configuration properties and initialize the connector, execute:
confluent connect cluster create --config-file oracle-source.json
This command must result in a success message.
Step 5: Check the Connector Status
Verify if the connector works as expected by running the following:
confluent connect cluster list
This code will create a table-like structure with various columns. The columns will include a connector ID, connector name, the status column set to Running, and the connector type.
Finally, you can verify whether the messages populate in the Kafka topic. A successful pipeline setup will reflect all the changes made to the Oracle database in the Kafka topic.
Why struggle with complex Kafka connectors or manual configurations? With Estuary Flow, you can set up a real-time Oracle to Kafka pipeline in just a few clicks—no coding required!
Get started today! Try Estuary Flow for free or book a demo with our experts to see it in action.
Oracle to Kafka: Comparing Estuary Flow vs. Kafka Connect
Here is the detailed comparative analysis of Estuary Flow and JDBC source connector used for Oracle to Kafka integration:
Data Integration Setup
Setting up Confluent’s Oracle JDBC source connector involves complex configurations and coding with continuous maintenance overhead.
On the other hand, by leveraging Estuary Flow’s Oracle and Kafka connectors, you do not need to worry about resource management. Once you configure these connectors, it ensures real-time data replication from the Oracle database to Kafka topics.
Automation Capabilities
Although building streaming data pipelines using Confluent CLI is effective, it lacks automation capabilities. Manually defining connectors and creating JSON configuration files can lead to errors that can be time-consuming to resolve.
In contrast, Estuary Flow provides robust automation capabilities, helping you directly capture changes from Oracle’s redo logs without manual intervention. This makes Estuary Flow a more efficient and scalable alternative to the JDBC connector.
Extensibility and Data Transformation
Confluent’s Oracle to Kafka Connector is highly customizable but requires manual configuration and external connectors for advanced data transformation capabilities. You need to write custom Single Message Transforms (SMTs) or use tools like kSQLDB to modify data before pushing it into Kafka topics.
On the other hand, Estuary Flow allows you to perform real-time data transformations via SQL or TypeScript derivations. These derivations assist in changing the data after it is captured and before it's processed to Kafka topics. They facilitate operations like aggregations, joins, windowing, and unnesting within data streams.
Conclusion: The Best Way to Stream Data from Oracle to Kafka
Streaming Oracle data to Kafka is essential for facilitating real-time analytics and event-driven software development.
By creating a streaming data pipeline, you can identify the changes made to the Oracle database and replicate them in specific Kafka Topics. These updates can then be consumed by the consumer applications to perform operations like sending email notifications to customers who recently completed a user survey.
You can use Kafka Connect with JDBC for streaming Oracle to Kafka. However, it requires manual configuration of connector properties, adding to the setup complexity. Instead, a CDC-enabled solution like Estuary Flow can simplify the process. This platform automates streaming data and change tracking without requiring any additional tools.
Get in touch with the experts at Estuary to explore how this platform can fit into your data streaming workflows effortlessly.
FAQs
Why is Kafka used to stream data from Oracle?
Kafka allows real-time data streaming by efficiently ingesting, processing, and distributing data at scale. It helps in building mission-critical applications, analytics pipelines, and live monitoring systems.
How do schema changes in Oracle impact Kafka streaming?
Schema changes in Oracle lead to ingestion failures in Kafka Connect if not managed properly. Using a Schema Registry with Avro, Protobuf, or JSON schema aids in handling schema evolution smoothly. With Estuary Flow, you can ensure automatic schema evolution.
Related Articles

About the author
With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.
Popular Articles
