Estuary

NetSuite to Databricks Integration: A Step-by-Step Guide

Discover the ultimate guide to effortlessly connect NetSuite to Databricks for enhanced data analytics and streamlined operations.

netsuite to databricks - Blog hero image
Share this article

NetSuite is a leading enterprise resource planning (ERP) system provider, offering a comprehensive suite of tools to help you manage various business operations. However, despite its extensive features, NetSuite’s capabilities might not suffice every organization’s needs for large-scale data processing and advanced analytics.

If you prioritize data-driven decision-making, Databricks is an impressive choice. Migrating from NetSuite to Databricks offers numerous benefits, including faster data processing, real-time insights, and leveraging sophisticated analytical techniques. These benefits can lead to improved business innovation to gain a competitive edge.

In this article, you will explore how to effectively connect NetSuite to Databricks.

If you prefer to skip the overview and jump right into the step-by-step methods for connecting NetSuite to Databricks, click here to go directly to the instructions.

Overview of NetSuite

Netsuite to Databricks - NetSuite logo

Image Source

NetSuite ERP is a comprehensive cloud business management solution that provides visibility into real-time financial and operational performance.

As an integrated suite of applications, NetSuite is handy for managing accounting, order processing, inventory management, production, supply chain, and warehouse operations. Automating such critical operations enables your business to gain greater control over operations and run more efficiently.

Overview of Databricks

NetSuite to databricks - databricks logo

Image Source

Databricks is a cloud-based platform built on the robust Apache Spark engine. It is designed to handle massive datasets and complex data processing tasks. Databricks provides a centralized workspace that facilitates collaboration among business analysts, developers, and scientists, enabling them to efficiently develop and deploy data-driven applications.

With integrated support for ML frameworks such as TensorFlow, PyTorch, and Scikit-learn, Databricks helps you track experiments and share and deploy ML models.

A few key features of Databricks include:

  • Databricks is highly scalable, allowing you to manage massive data volumes effortlessly. It also adapts to various data processing workloads with configurations that support batch processingreal-time streaming, and machine learning.
  • Databricks provides robust security features designed to meet the requirements of enterprise-level deployments. This includes compliance standards such as GDPR, HIPAA, and SOC2.
  • Databricks is available on major cloud providers, including AWS, Microsoft Azure, and GCP, allowing you to leverage cloud scalability and flexibility.

Why Integrate NetSuite with Databricks?

Some of the advantages of a Databricks NetSuite integration are listed below:

  • Databricks enable real-time data processing for quick analysis. This is crucial for applications that require real-time customer interaction management for immediate decision-making.
  • Databricks is a modern, single platform for all your analytics and AI use cases, offering a comprehensive environment for data analytics. This is particularly useful for extending NetSuite's analytical functions, allowing more complex and varied data analysis workflows.
  • Databricks accelerates the development of AI and ML capabilities by leveraging collaborative, self-service tools and open-source technologies like MLflow and Apache Spark. This can help provide deeper insights and predictive analytics for your NetSuite data.

How to Connect NetSuite to Databricks: 2 Methods

Let’s look into two methods that can help integrate NetSuite with Databricks:

  • Method 1: Using Estuary Flow for a NetSuite Databricks Connection
  • Method 2: Using CSV Export/Import to Integrate NetSuite with Databricks

Method 1: Using Estuary Flow for a NetSuite Databricks Connection

Estuary Flow is an efficient real-time ETL solution that offers improved scalability, reliability, and integration capabilities. It allows you to transfer data from NetSuite to Databricks with just a few clicks and without in-depth technical expertise.

Some of the important features of Estuary Flow are listed below:

  • Wide Range of Connectors: Estuary Flow provides over 200+ pre-built connectors to help establish connections, integrating data from various sources to your choice destinations. Owing to the no-code configuration of these connectors, the process of setting up a data pipeline is significantly simplified.
  • Change Data Capture (CDC): Estuary Flow supports CDC for a seamless Databricks NetSuite integration. All updates to the NetSuite database will be immediately reflected in Databricks without requiring human intervention, helping achieve real-time analytics and decision-making.
  • Scalability: Estuary Flow is designed for horizontal scaling to handle varying workloads and data volumes. It can run active workloads from any database at up to 7 GB/s.

Here are the steps to migrate from NetSuite to Databricks using Estuary Flow.

Prerequisites

Before you set up your NetSuite to Databricks pipeline with Estuary Flow, make sure you have the following:

NetSuite

  • An active Oracle NetSuite account.
  • SuiteAnalytics Connect enabled (preferred) or SuiteQL if SuiteAnalytics isn’t available.
  • custom role with full or view access to Transactions, Reports, Lists, and Setup.
  • user assigned to this custom role.
  • Token-based authentication credentials: Consumer Key, Consumer Secret, Token ID, and Token Secret.
  • Your Realm/Account ID, e.g., 1234567 for production or 1234567_SB1 for sandbox.

Databricks

  • A Databricks account with:
    • Unity Catalog
    • A SQL Warehouse
    • A schema in the catalog
  • Authentication credentials: either a Personal Access Token (PAT) or a Service Principal token (admins group only).

Estuary Flow

Step 1: Configure NetSuite as Your Source

  • Log in to your Estuary Flow account.
  • Click Sources on the left-side pane of the dashboard.
  • To proceed setting up the source end of the integration pipeline, click + NEW CAPTURE.
netsuite to databricks - source search connectors
  • In the Search connectors box, type NetSuite. Click the Capture button of the NetSuite connector when you see it in the search results.
Netsuite to databricks - Capture Details
  • On the NetSuite connector configuration page, provide the following:
    • Name: A descriptive name for your capture.
    • Account ID (Realm): Your NetSuite account ID, e.g., 1234567 (production) or 1234567_SB1 (sandbox).
    • Connection Type: Choose suiteanalytics (recommended) or suiteql.
    • Role ID: The ID of the custom role you created for Estuary Flow (defaults to 3 for Administrator).
    • Authentication: Enter the token-based authentication values:
      • Consumer Key
      • Consumer Secret
      • Token ID
      • Token Secret 
  • Click NEXT > SAVE AND PUBLISH. The connector will capture data from NetSuite into Flow collections.

How it works:

The NetSuite connector uses SuiteAnalytics Connect to efficiently fetch large volumes of data and automatically discovers available tables, schemas, keys, and cursor fields. It supports incremental updates with log cursors and performs historical backfills with page cursors. Once published, Estuary Flow continuously captures changes from NetSuite into Flow collections.

Step 2: Configure Databricks as Your Destination

  • After successfully creating your NetSuite capture, you’ll see a pop-up window with capture details. Click MATERIALIZE COLLECTIONS to begin configuring your destination.
    • Alternatively, navigate to the Estuary dashboard and click Destinations+ NEW MATERIALIZATION.
NetSuite to Databricks - Destination Search Connector page
  • On the Create Materialization page, type Databricks in the Search connectors box, and click the Materialization button of the connector when it appears in search results.
Netsuite to Databricks - Destination Specify Details page
  •  On the Databricks connector configuration page, provide the following:
    • Name: A descriptive name for your materialization.
    • Address: The host and port of your Databricks SQL Warehouse (port 443 is used by default if not specified)
    • HTTP Path: The HTTP path of your SQL Warehouse.
    • Catalog Name: The name of your Unity Catalog.
    • Schema Name: The default schema where tables should be created.
    • Authentication: Choose PAT for a personal access token, or use a Service Principal token if your principal is part of the admins group. Enter the token value in the required field.
  • Once configured, review the Source Collections section to confirm your NetSuite collections are bound to this materialization.
  • Then, click NEXT > SAVE AND PUBLISH to materialize Flow collections of your NetSuite data into Databricks tables.

How it works: 

The Databricks connector stages captured NetSuite data into a Unity Catalog Volume, then transactionally applies the changes into Databricks tables. Sync schedules default to a 30-minute delay but can be adjusted to meet your workload. For very large datasets, you can enable delta updates to reduce costs and latency. Optional column mapping is available if you need to handle schema evolution.

Get Started with Estuary Flow for Free.png

Method 2: Using CSV Export/Import to Integrate NetSuite with Databricks

This method involves extracting data from NetSuite as CSV and loading the extracted data into Databricks.

Step 1: Export NetSuite Data as CSV Files

To export NetSuite data as CSV files, follow the steps below:

  • In your NetSuite account, navigate to Setup > Import/Export > Export Tasks > Full CSV Export.
  • Click Submit. A progress bar that indicates the status of your export will appear.
  • Upon completion of the export process, a File Download window will pop up.
  • In the File Download window, select Save this file to disk, then click OK.
  • Save As dialog box opens with the File Name field highlighted.
  • Enter the desired file name for your CSV files.
  • Click Save to download your files.
  • The files will be saved in ZIP format. Open the ZIP file to view the titles of the exported CSV files.
  • Use a spreadsheet program or text editor to open and review the CSV file.

Step 2: Export CSV to Databricks

  • In the Databricks workspace, click the Data tab in the sidebar to open the Data View window.
  • Select the CSV files you want to upload by clicking the Upload File button.
  • By default, Databricks assigns a table name based on the file name or format. However, you can change the settings or add a unique table name as required.
  • After a successful upload, select the Data tab in your Databricks workspace to confirm the files are correctly uploaded.

Limitations of Using CSV Files to Migrate from NetSuite to Databricks

Although migrating data using CSV files can be effective for a NetSuite-Databricks integration, there are some associated limitations, including:

  • Resource-Intensive: This method involves manual efforts to extract NetSuite data as CSV files and import CSV into Databricks. It can be significantly time-consuming and resource-intensive, particularly for frequent data transfers.
  • Data Integrity Issues: Due to the manual efforts involved in this method, there’s an increased risk of data errors or inconsistencies, leading to reduced data integrity. 
  • Lacks Real-time Integration: For any updates, deletions, or additions to your NetSuite data, you must either manually update or re-import the entire dataset to reflect the changes in Databricks. As a result, this method isn’t suitable for applications requiring real-time data updates.

Also Read:

Conclusion

Migrating NetSuite data to Databricks enables real-time analytics, machine learning, and scalable data processing. While the CSV export/import method is simple, it’s manual and lacks real-time sync.

With Estuary Flow, you get an automated pipeline powered by SuiteAnalytics capture and Databricks materialization. It handles historical backfills, CDC for incremental updates, and syncs data transactionally through Unity Catalog. Optional features like delta updates and schema evolution make it scalable and cost-efficient.

Looking for a solution for effortless integration between sources and destinations of your choice? Sign up for your Estuary account today! 

Related Guides on Integrate NetSuite Data With Other Platforms:

FAQs

    Can I use NetSuite SuiteQL instead of SuiteAnalytics?

    Yes. Estuary Flow supports both SuiteAnalytics (preferred for speed and schema discovery) and SuiteQL mode. SuiteQL can be used if you don’t have SuiteAnalytics Connect enabled in your account.
    The connector stages data in a Unity Catalog Volume, then applies it transactionally to Databricks SQL Warehouse tables. You can configure sync schedules and enable delta updates for large, high-volume datasets.
    You can connect using token-based authentication (recommended for security) or username/password credentials. Token-based authentication requires creating a NetSuite integration, role, and user with appropriate permissions.
    Yes. Estuary Flow can evolve schemas automatically and supports column mapping for Databricks. This ensures pipelines remain stable even when fields are added or altered in NetSuite.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Rob Meyer
Rob MeyerMarketing

Rob has worked extensively in marketing and product marketing on database, data integration, API management, and application integration technologies at WS02, Firebolt, Imply, GridGain, Axway, Informatica, and TIBCO.

Related Articles

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.