
Organizations managing large volumes of customer data can significantly benefit from utilizing tools like HubSpot. It offers robust functionalities for sales, marketing, and customer service, helping improve customer relationship management and promote targeted marketing efforts. However, if you are looking for advanced analytics and processing complex data, consider connecting HubSpot to Databricks.
Such an integration allows you to take advantage of Databricks' powerful capabilities in machine learning, big data analytics, and advanced processing tasks.
This comprehensive guide walks you through two effective methods for connecting HubSpot to Databricks, both automated (using Estuary) and manual (CSV export/import).
If you're ready to dive into the methods for connecting HubSpot to Databricks, click here to jump straight to the step-by-step guide.
Overview of HubSpot
HubSpot is a popular AI-powered Customer Relationship Management (CRM) platform designed to help you attract, engage, and manage customers, ultimately aiming to boost sales. It offers a comprehensive suite of tools across five main products or Hubs: Marketing Hub, Content Management System (CMS) Hub, Sales Hub, Service Hub, and Operations Hub.
You can utilize several fully functional marketing services and tools within these Hubs to significantly benefit your daily business operations. These services include chatbots, form creation, ads management, email marketing, CRM, web analytics tracking, and customer support.
Some of the features of HubSpot are listed below.
- Integrations: HubSpot offers integrations with over 100+ popular apps and web services. You can utilize this extensive integration capability to connect HubSpot with other systems for streamlining your workflows and enhancing operational efficiency.
- List Segmentation: HubSpot allows you to group your contacts with similar qualities based on their customer profiles. Grouping contacts into lists makes it easier for your marketing team to send tailored messages to these individuals.
Overview of Databricks
Databricks is a cloud-based data analytics platform that facilitates the building and deploying of data-driven applications. It provides an integrated workspace where business analysts, developers, and data scientists can collaborate to develop and deploy optimized solutions.
Databricks is designed to simplify the management of large datasets, making it a great choice for real-time analysis, machine learning, and data management. Since Databricks offers seamless integration with various cloud providers, such as Azure, AWS, and GCP, you can achieve a unified analytics experience across different platforms.
Let's look through some of the features of Databricks.
- Collaborative Workspace: Databricks offers a shared workspace where you can exchange data, insights, and notebooks with other team members. This collaborative environment supports real-time data processing in data engineering and machine learning projects.
- Automation: Databricks offers automation that streamlines the creation, management, and deployment of applications. This includes tools for work scheduling and auto-scaling.
What is the Need to Connect HubSpot to Databricks?
Let's look into some reasons to connect HubSpot to Databricks:
- Real-Time Processing: You can benefit from Databricks’ real-time data processing feature to enhance your marketing analytics for personalized customer interactions. By analyzing your HubSpot data in Databricks, you can gain real-time insights for optimizing marketing strategies and customer segmentation.
- Integration with Machine Learning: Databricks supports various machine learning libraries, like PyTorch and TensorFlow. With a HubSpot Databricks connect, you can build advanced analytical models to gain deeper insights from your HubSpot data for effective decision-making.
- Advance Analytics: Leverage Databricks' powerful analytics tools to unlock hidden patterns and trends in your HubSpot data.
Let's dive into the details of the 2 HubSpot to Databricks integration methods!
The Automated Way: Using Estuary to Connect HubSpot to Databricks
Estuary captures HubSpot data via the HubSpot API and materializes it into tables in a Databricks SQL Warehouse. Updates in HubSpot are synced to Databricks on the configured schedule (default: every 30 minutes).
Which HubSpot Connector Should You Use?
Estuary provides two separate HubSpot connectors. They have different sync mechanisms, authentication options, and use cases.
| HubSpot Real-Time | HubSpot (Standard) | |
|---|---|---|
| Sync type | Incremental real-time polling | Batch incremental |
| Authentication | OAuth2 or Private App Access Token | Private App Access Token only |
| Recommended for | Production pipelines needing low-latency sync | Simpler setups, less frequent updates |
| Auto-discovers resources | Yes | Yes |
| HubSpot API version | Latest native | Legacy |
Recommendation: Use the HubSpot Real-Time connector for most production use cases. The standard connector is available for compatibility, but is not recommended for new pipelines.
Let's dive into the complete, step-by-step process for loading your data from HubSpot to Databricks using Estuary.
Prerequisites
HubSpot side (OAuth2 method):
- A HubSpot account with the data you want to capture
- Your HubSpot user must have access to the objects you want to sync
HubSpot side (Private App Token method):
- Your HubSpot user account must have super admin privileges
- You must create a private app in HubSpot:
- In HubSpot, go to Settings > Integrations > Private Apps
- Click Create a private app
- Name the app (suggested: "Estuary")
- Under Scopes, grant Read access for all available scopes
- Click Create app and copy the generated access token
Databricks side:
- A Databricks workspace on a supported cloud (AWS, Azure, or GCP)
- A SQL Warehouse already created in your workspace. The Estuary connector materializes into a SQL Warehouse specifically, not an all-purpose cluster. If you do not have one: in Databricks go to SQL Warehouses > Create SQL Warehouse, configure it, and note the connection details.
- A Unity Catalog enabled on your workspace (required for the connector's staging mechanism)
- A user or service principal with permissions to create tables in the target catalog and schema
- A Personal Access Token (PAT) or service principal token. To create a PAT: go to Settings > Developer > Access Tokens > Generate new token
Note on service principal tokens: as of current documentation, only service principals in the Databricks "admins" group can use a token with this connector. If using a service principal, confirm it is added to the admins group under Settings > Identity and access > Groups > admins.
Estuary:
- An Estuary account at dashboard.estuary.dev
Step 1: Configuring HubSpot as the Source
- Log in to your Estuary account to start configuring HubSpot as the source end of your integration pipeline.
- Navigate to Sources and click + New Capture
- Search for HubSpot in the connector search box
- Select HubSpot Real-Time for production pipelines (recommended), or HubSpot for the standard batch connector
- Give the capture a unique name
- Under Authentication, choose your method:
- OAuth2 (recommended): click Sign in with HubSpot and authorize Estuary in the OAuth flow
- Private App Access Token: paste the token you generated in the prerequisites above
- Click Next. Estuary connects to your HubSpot account and automatically discovers all available resources based on your account's data and permissions.
- Review the discovered resources. Deselect any you do not need to sync.
- Click Save and Publish. Estuary begins the initial backfill immediately.
Step 2: Create the Databricks Materialization
- After the capture is published, navigate to Destinations and click + New Materialization
- Search for Databricks and click Materialization
- Fill in the endpoint configuration. Find these values in your Databricks SQL Warehouse under the Connection Details tab:
- Address: your Databricks workspace hostname (e.g.,
dbc-abcdefgh-a12b.cloud.databricks.com) - HTTP Path: the SQL Warehouse HTTP path (e.g.,
/sql/1.0/warehouses/abcd123efgh4567) - Catalog Name: the Unity Catalog name (e.g.,
main) - Schema Name: the schema within the catalog where tables will be created (default:
default) - Personal Access Token: the PAT or service principal token from the prerequisites
- Address: your Databricks workspace hostname (e.g.,
- Click Next. Estuary maps HubSpot collections to Databricks tables automatically.
- Review the collection-to-table mapping. Each HubSpot resource maps to a separate table.
- Click Save and Publish.
Estuary performs the initial full load, then syncs updates on the configured schedule. The default sync delay is 30 minutes. To adjust this, go to the materialization settings and update the Sync Schedule configuration.
Why the 30-minute delay? Databricks SQL Warehouses incur compute costs per query. Estuary batches updates by default to minimize warehouse runtime. If you need lower latency, reduce the sync interval in the schedule settings, but note this increases warehouse usage and cost. Estuary recommends setting the SQL Warehouse Auto Stop parameter to the minimum available to control costs.
💡 Pro-Tip: Managing HubSpot API Limits Traditional batch tools often 'burst' their API calls, hitting HubSpot's rate limits and causing sync failures. Estuary uses incremental polling, which spreads the API load evenly over time. This ensures you stay within your HubSpot tier limits while maintaining near real-time data freshness.
What lands in Databricks
Each HubSpot resource becomes a table in your specified catalog and schema. For example:
main.default.contactsmain.default.dealsmain.default.engagements
Estuary handles table creation automatically. If a table already exists, it applies incremental updates using standard merge behavior.
The Manual Way: Using CSV Export/Import to Connect HubSpot to Databricks
For one-time analysis or small datasets, exporting from HubSpot as CSV and uploading to Databricks manually is the simplest path.
When to use: one-off data pulls, small datasets, ad hoc analysis that does not need ongoing sync.
Limitations: requires manual repetition every time data needs refreshing, no incremental updates, risk of data inconsistency between HubSpot and Databricks over time.
Step 1: Export HubSpot Data as CSV
- Log in to your HubSpot account
- Navigate to the object you want to export (e.g., CRM > Contacts or CRM > Deals)
- Click Actions in the top right, then Export view
- Choose CSV as the file format
- Select the properties to include and the column header language
- Click Export. HubSpot sends a download link to your email when the file is ready.
- Download the CSV file
Step 2: Upload CSV to Databricks
The current Databricks UI uses the following path for CSV uploads (note: the legacy "Data tab > Upload File" workflow has been replaced in current Databricks versions):
- In your Databricks workspace, click + New in the sidebar, then select Add or upload data
- Select Upload files and drag your CSV file into the upload area
- Databricks previews the data and auto-detects column types. Review the schema before proceeding.
- Set the Catalog, Schema, and Table name for the destination table
- Click Create table
Alternatively, if you prefer the Catalog explorer:
- Navigate to Catalog in the left sidebar
- Select your target catalog and schema
- Click Create > Create table from file upload
- Follow the same upload and schema review steps
Step 3: Verify the Data
After the table is created, run a basic validation query in a Databricks notebook or the SQL editor:
sql-- Confirm row count
SELECT COUNT(*) FROM your_catalog.your_schema.your_table;
-- Preview sample rows
SELECT * FROM your_catalog.your_schema.your_table LIMIT 10;
Cross-check the row count against the HubSpot export to confirm nothing was lost during upload.
Conclusion
Connecting HubSpot to Databricks enables you to leverage Databricks' advanced analytical capabilities for your HubSpot customer data. This integration optimizes customer analysis to enhance overall sales and marketing strategies and maximize profit. You can effectively gain data-driven insights with Databricks' robust data processing and machine learning abilities.
You have seen two different methods to connect HubSpot to Databricks. One is the manual method through CSV export/import, which is time-consuming and does not support real-time data integration. The other is the automated method using Estuary, which is highly scalable and comes with robust features like Change Data Capture functionality and built-in connectors.
Automate your data integration with Estuary and utilize its robust features, CDC support, and impressive connector set. Sign in now to get started!
If you’re interested in connecting other platforms to your data warehouse, you might find our guides for HubSpot to BigQuery and HubSpot to Snowflake useful. These resources demonstrate how to achieve seamless integrations across various sources to unlock the full potential of your data.
FAQs
Does the Databricks connector require Unity Catalog?
Can I sync specific HubSpot objects only?
What happens if my Databricks SQL Warehouse is stopped when a sync runs?

About the author
Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.











