hubspotdatabricks

10 min read

Last updated: May 11, 2026

HubSpot to Databricks Integration: 2 Efficient Ways

Explore the effective methods to connect HubSpot to Databricks for efficient analytics and enhanced data processing capabilities.

Dani Pálma Developer Relations Lead

Share this article

Summarize this page with AI

Start Building For Free

Organizations managing large volumes of customer data can significantly benefit from utilizing tools like HubSpot. It offers robust functionalities for sales, marketing, and customer service, helping improve customer relationship management and promote targeted marketing efforts. However, if you are looking for advanced analytics and processing complex data, consider connecting HubSpot to Databricks.

Such an integration allows you to take advantage of Databricks' powerful capabilities in machine learning, big data analytics, and advanced processing tasks.

This comprehensive guide walks you through two effective methods for connecting HubSpot to Databricks, both automated (using Estuary) and manual (CSV export/import).

If you're ready to dive into the methods for connecting HubSpot to Databricks, click here to jump straight to the step-by-step guide.

Overview of HubSpot

HubSpot is a popular AI-powered Customer Relationship Management (CRM) platform designed to help you attract, engage, and manage customers, ultimately aiming to boost sales. It offers a comprehensive suite of tools across five main products or Hubs: Marketing Hub, Content Management System (CMS) Hub, Sales Hub, Service Hub, and Operations Hub.

You can utilize several fully functional marketing services and tools within these Hubs to significantly benefit your daily business operations. These services include chatbots, form creation, ads management, email marketing, CRM, web analytics tracking, and customer support.

Some of the features of HubSpot are listed below.

Integrations: HubSpot offers integrations with over 100+ popular apps and web services. You can utilize this extensive integration capability to connect HubSpot with other systems for streamlining your workflows and enhancing operational efficiency.
List Segmentation: HubSpot allows you to group your contacts with similar qualities based on their customer profiles. Grouping contacts into lists makes it easier for your marketing team to send tailored messages to these individuals.

Overview of Databricks

Databricks is a cloud-based data analytics platform that facilitates the building and deploying of data-driven applications. It provides an integrated workspace where business analysts, developers, and data scientists can collaborate to develop and deploy optimized solutions.

Databricks is designed to simplify the management of large datasets, making it a great choice for real-time analysis, machine learning, and data management. Since Databricks offers seamless integration with various cloud providers, such as Azure, AWS, and GCP, you can achieve a unified analytics experience across different platforms.

Let's look through some of the features of Databricks.

Collaborative Workspace: Databricks offers a shared workspace where you can exchange data, insights, and notebooks with other team members. This collaborative environment supports real-time data processing in data engineering and machine learning projects.
Automation: Databricks offers automation that streamlines the creation, management, and deployment of applications. This includes tools for work scheduling and auto-scaling.

What is the Need to Connect HubSpot to Databricks?

Let's look into some reasons to connect HubSpot to Databricks:

Real-Time Processing: You can benefit from Databricks’ real-time data processing feature to enhance your marketing analytics for personalized customer interactions. By analyzing your HubSpot data in Databricks, you can gain real-time insights for optimizing marketing strategies and customer segmentation.
Integration with Machine Learning: Databricks supports various machine learning libraries, like PyTorch and TensorFlow. With a HubSpot Databricks connect, you can build advanced analytical models to gain deeper insights from your HubSpot data for effective decision-making.
Advance Analytics: Leverage Databricks' powerful analytics tools to unlock hidden patterns and trends in your HubSpot data.

Let's dive into the details of the 2 HubSpot to Databricks integration methods!

The Automated Way: Using Estuary to Connect HubSpot to Databricks

Estuary captures HubSpot data via the HubSpot API and materializes it into tables in a Databricks SQL Warehouse. Updates in HubSpot are synced to Databricks on the configured schedule (default: every 30 minutes).

Which HubSpot Connector Should You Use?

Estuary provides two separate HubSpot connectors. They have different sync mechanisms, authentication options, and use cases.

	HubSpot Real-Time	HubSpot (Standard)
Sync type	Incremental real-time polling	Batch incremental
Authentication	OAuth2 or Private App Access Token	Private App Access Token only
Recommended for	Production pipelines needing low-latency sync	Simpler setups, less frequent updates
Auto-discovers resources	Yes	Yes
HubSpot API version	Latest native	Legacy

Recommendation: Use the HubSpot Real-Time connector for most production use cases. The standard connector is available for compatibility, but is not recommended for new pipelines.

Let's dive into the complete, step-by-step process for loading your data from HubSpot to Databricks using Estuary.

Prerequisites

HubSpot side (OAuth2 method):

A HubSpot account with the data you want to capture
Your HubSpot user must have access to the objects you want to sync

HubSpot side (Private App Token method):

Your HubSpot user account must have super admin privileges
You must create a private app in HubSpot:
1. In HubSpot, go to Settings > Integrations > Private Apps
2. Click Create a private app
3. Name the app (suggested: "Estuary")
4. Under Scopes, grant Read access for all available scopes
5. Click Create app and copy the generated access token

Databricks side:

A Databricks workspace on a supported cloud (AWS, Azure, or GCP)
A SQL Warehouse already created in your workspace. The Estuary connector materializes into a SQL Warehouse specifically, not an all-purpose cluster. If you do not have one: in Databricks go to SQL Warehouses > Create SQL Warehouse, configure it, and note the connection details.
A Unity Catalog enabled on your workspace (required for the connector's staging mechanism)
A user or service principal with permissions to create tables in the target catalog and schema
A Personal Access Token (PAT) or service principal token. To create a PAT: go to Settings > Developer > Access Tokens > Generate new token

Note on service principal tokens: as of current documentation, only service principals in the Databricks "admins" group can use a token with this connector. If using a service principal, confirm it is added to the admins group under Settings > Identity and access > Groups > admins.

Estuary:

An Estuary account at dashboard.estuary.dev

Step 1: Configuring HubSpot as the Source

Log in to your Estuary account to start configuring HubSpot as the source end of your integration pipeline.
Navigate to Sources and click + New Capture
Search for HubSpot in the connector search box
Select HubSpot Real-Time for production pipelines (recommended), or HubSpot for the standard batch connector

Hubspot to Databricks - Hubspot Connector Search

Give the capture a unique name
Under Authentication, choose your method:
- OAuth2 (recommended): click Sign in with HubSpot and authorize Estuary in the OAuth flow
- Private App Access Token: paste the token you generated in the prerequisites above

Hubspot to Databricks - Hubspot Connector Page

Click Next. Estuary connects to your HubSpot account and automatically discovers all available resources based on your account's data and permissions.
Review the discovered resources. Deselect any you do not need to sync.
Click Save and Publish. Estuary begins the initial backfill immediately.

Step 2: Create the Databricks Materialization

After the capture is published, navigate to Destinations and click + New Materialization
Search for Databricks and click Materialization

Hubspot to Databricks - Databricks Connector Search

Fill in the endpoint configuration. Find these values in your Databricks SQL Warehouse under the Connection Details tab:
- Address: your Databricks workspace hostname (e.g., dbc-abcdefgh-a12b.cloud.databricks.com)
- HTTP Path: the SQL Warehouse HTTP path (e.g., /sql/1.0/warehouses/abcd123efgh4567)
- Catalog Name: the Unity Catalog name (e.g., main)
- Schema Name: the schema within the catalog where tables will be created (default: default)
- Personal Access Token: the PAT or service principal token from the prerequisites

Hubspot to Databricks - Databricks Connector Page

Click Next. Estuary maps HubSpot collections to Databricks tables automatically.
Review the collection-to-table mapping. Each HubSpot resource maps to a separate table.
Click Save and Publish.

Estuary performs the initial full load, then syncs updates on the configured schedule. The default sync delay is 30 minutes. To adjust this, go to the materialization settings and update the Sync Schedule configuration.

Why the 30-minute delay? Databricks SQL Warehouses incur compute costs per query. Estuary batches updates by default to minimize warehouse runtime. If you need lower latency, reduce the sync interval in the schedule settings, but note this increases warehouse usage and cost. Estuary recommends setting the SQL Warehouse Auto Stop parameter to the minimum available to control costs.

💡 Pro-Tip: Managing HubSpot API Limits Traditional batch tools often 'burst' their API calls, hitting HubSpot's rate limits and causing sync failures. Estuary uses incremental polling, which spreads the API load evenly over time. This ensures you stay within your HubSpot tier limits while maintaining near real-time data freshness.

What lands in Databricks

Each HubSpot resource becomes a table in your specified catalog and schema. For example:

main.default.contacts
main.default.deals
main.default.engagements

Estuary handles table creation automatically. If a table already exists, it applies incremental updates using standard merge behavior.

The Manual Way: Using CSV Export/Import to Connect HubSpot to Databricks

For one-time analysis or small datasets, exporting from HubSpot as CSV and uploading to Databricks manually is the simplest path.

When to use: one-off data pulls, small datasets, ad hoc analysis that does not need ongoing sync.

Limitations: requires manual repetition every time data needs refreshing, no incremental updates, risk of data inconsistency between HubSpot and Databricks over time.

Step 1: Export HubSpot Data as CSV

Log in to your HubSpot account
Navigate to the object you want to export (e.g., CRM > Contacts or CRM > Deals)
Click Actions in the top right, then Export view
Choose CSV as the file format
Select the properties to include and the column header language
Click Export. HubSpot sends a download link to your email when the file is ready.
Download the CSV file

Step 2: Upload CSV to Databricks

The current Databricks UI uses the following path for CSV uploads (note: the legacy "Data tab > Upload File" workflow has been replaced in current Databricks versions):

In your Databricks workspace, click + New in the sidebar, then select Add or upload data
Select Upload files and drag your CSV file into the upload area
Databricks previews the data and auto-detects column types. Review the schema before proceeding.
Set the Catalog, Schema, and Table name for the destination table
Click Create table

Alternatively, if you prefer the Catalog explorer:

Navigate to Catalog in the left sidebar
Select your target catalog and schema
Click Create > Create table from file upload
Follow the same upload and schema review steps

Step 3: Verify the Data

After the table is created, run a basic validation query in a Databricks notebook or the SQL editor:

sql
-- Confirm row count
SELECT COUNT(*) FROM your_catalog.your_schema.your_table;

-- Preview sample rows
SELECT * FROM your_catalog.your_schema.your_table LIMIT 10;

Cross-check the row count against the HubSpot export to confirm nothing was lost during upload.

Conclusion

Connecting HubSpot to Databricks enables you to leverage Databricks' advanced analytical capabilities for your HubSpot customer data. This integration optimizes customer analysis to enhance overall sales and marketing strategies and maximize profit. You can effectively gain data-driven insights with Databricks' robust data processing and machine learning abilities.

You have seen two different methods to connect HubSpot to Databricks. One is the manual method through CSV export/import, which is time-consuming and does not support real-time data integration. The other is the automated method using Estuary, which is highly scalable and comes with robust features like Change Data Capture functionality and built-in connectors.

Automate your data integration with Estuary and utilize its robust features, CDC support, and impressive connector set. Sign in now to get started!

If you’re interested in connecting other platforms to your data warehouse, you might find our guides for HubSpot to BigQuery and HubSpot to Snowflake useful. These resources demonstrate how to achieve seamless integrations across various sources to unlock the full potential of your data.

FAQs

What HubSpot permissions does the private app token need?

Grant Read access for all available scopes when creating the private app. Restricting scopes will prevent the connector from discovering and syncing those resources. Your HubSpot user must also have super admin privileges to create the private app.

Does the Databricks connector require Unity Catalog?

Yes. Estuary's Databricks connector uses Unity Catalog Volumes as a staging area before transactionally applying changes to tables. Unity Catalog must be enabled on your Databricks workspace.

Can I sync specific HubSpot objects only?

Yes. During capture setup, Estuary auto-discovers all available resources and presents them as a list. Deselect any resources you do not want to sync before clicking Save and Publish. You can also edit the capture later to add or remove bindings.

What happens if my Databricks SQL Warehouse is stopped when a sync runs?

The connector will trigger the warehouse to start automatically when a sync is scheduled. This is why Estuary recommends setting the Auto Stop parameter to the minimum available — the warehouse can stop between syncs and restart on demand, minimizing idle compute costs.

About the author

Dani PálmaDeveloper Relations Lead

Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.