Estuary

SharePoint to Snowflake: 3 Practical Ways to Load Data

Learn how to load data from Microsoft SharePoint into Snowflake using three practical approaches. Includes a step-by-step guide with Estuary, plus Azure Data Factory and custom options.

SharePoint to Snowflake Integration
Share this article

This tutorial shows how to load data from Microsoft SharePoint into Snowflake in a reliable, production-ready way.

It covers three practical methods used by data teams today, starting with Estuary, which captures files from SharePoint document libraries and writes them directly into Snowflake tables.

The goal is to help you choose the right approach based on setup time, data freshness, and ongoing operational effort.

Key Takeaways

  • SharePoint document libraries can be loaded into Snowflake without building custom ingestion code

  • Estuary provides a managed way to capture files from SharePoint and materialize them into Snowflake tables

  • File formats such as CSV, JSON, and compressed archives can be parsed and structured automatically

  • Snowflake ingestion supports incremental updates and Snowpipe Streaming for low-latency loads

  • Alternative native and DIY approaches exist but require more setup and ongoing operational effort

Method 1: Load Data from SharePoint to Snowflake Using Estuary (Step-by-step)

This method uses Estuary, the right-time data platform, to capture files from Microsoft SharePoint document libraries and load them into Snowflake tables. Estuary structures SharePoint files as collections and materializes those collections into Snowflake using the appropriate ingestion method based on your configuration.

Step 1: Create and Configure a SharePoint Capture

In the Estuary dashboard, use the left-hand navigation to go to Sources, then click +New Capture.

1. Select the SharePoint connector

Setup SharePoint Connector to move data into Snowflake
  • In the connector selection screen, search for SharePoint
  • Select the SharePoint source connector
  • Enter a capture name
  • Select the data plane where the capture will run

This connector reads files from SharePoint document libraries and converts them into structured records.

2. Authenticate with Microsoft

Configure SharePoint Connector to connect with Snowflake
  • In the authentication section, click Authenticate your Microsoft account
  • Complete the OAuth2 sign-in flow in the popup window
  • Grant Estuary permission to access SharePoint Online

Authentication is handled using OAuth2 and is managed directly in the Estuary web application.

3. Configure the SharePoint site and folder

In the Site Configuration section, choose the URL method, which is the recommended option.

  • In the Site URL field, paste the full SharePoint folder URL, for example:
plaintext
https://contoso.sharepoint.com/sites/Marketing/Shared Documents/quarterly-reports

Estuary automatically parses the URL to determine:

  • The SharePoint site
  • The document library
  • The folder path to monitor

The Components method is also available for advanced use cases where the site ID and drive ID are already known, but it is not required for most SharePoint to Snowflake pipelines.

4. Configure optional file filtering and parsing

In the same capture configuration screen, you can optionally refine which files are captured and how they are parsed.

  • Match Keys

Use a regular expression to filter files, such as by path or extension. This regex, for example, would only capture CSV files:

plaintext
.*\.csv
  • Parser Configuration

By default, Estuary automatically detects:

  • File formats such as CSV, JSON, Avro, Protobuf, or W3C logs
  • Compression such as ZIP, GZIP, or ZSTD
  • If automatic detection is not sufficient, you can explicitly configure:
    • File format
    • Compression type
    • CSV-specific settings such as delimiter, headers, encoding, quoting, and line endings

For most SharePoint document libraries, automatic parsing works without additional configuration.

5. Publish the capture

  • Review the capture configuration
  • Click Publish to activate the capture

Once published:

  • Estuary scans the specified SharePoint folder
  • New and updated files are detected during each sync
  • File contents are parsed into structured records and written to Estuary collections

Step 2: Verify the Captured Collections

After publishing the capture, use the left-hand navigation to go to Collections.

  • Locate the collection created by the SharePoint capture
  • Open the collection to preview sample records

Each collection contains:

  • Structured records derived from SharePoint files
  • An inferred JSON schema
  • Metadata such as publish timestamps

Verifying collections at this stage ensures the SharePoint data is correctly parsed before loading it into Snowflake.

Step 3: Create and Configure a Snowflake Materialization

In the Estuary dashboard, use the left-hand navigation to go to Destinations, then click New Materialization.

1. Select the Snowflake connector

  • Search for Snowflake
  • Select the Snowflake materialization connector
  • Enter a materialization name
  • Select the data plane

2. Configure Snowflake connection details

Configure Snowflake Connector to get data from SharePoint

In the Endpoint Configuration section, provide the Snowflake connection details:

  • Host: Your Snowflake account URL without the protocol (Example: orgname-accountname.snowflakecomputing.com)
  • Database: Name of the target Snowflake database
  • Schema: Target schema where tables will be created
  • Warehouse: Virtual warehouse used for ingestion
  • Role: Role assigned to the Snowflake user

Select the appropriate timestamp type mapping based on your Snowflake environment.

3. Authenticate using key-pair (JWT)

In the Credentials section:

  • Enter the Snowflake username
  • Paste or upload the private key associated with the user

Snowflake has deprecated simple user/password authentication and does not support it for Snowpipe Streaming.

4. Configure sync behavior and ingestion mode

Estuary supports multiple Snowflake ingestion methods, selected based on your configuration:

  • Bulk COPY (default): Used for scheduled, batch-oriented ingestion. This minimizes Snowflake warehouse usage and keeps compute costs predictable.
  • Snowpipe Streaming with Delta Updates: Used for near real-time ingestion. Data is written directly into Snowflake tables without waking a warehouse.

The ingestion method is controlled by:

  • The materialization sync schedule
  • Whether Delta Updates are enabled on individual bindings

For batch workloads, configuring the Snowflake warehouse to auto-suspend after 60 seconds helps control costs.

5. Bind collections to Snowflake tables

  • Select one or more collections to materialize
  • For each binding:
    • Optionally adjust the target Snowflake table name
    • Optionally enable Delta Updates

When Delta Updates are enabled:

  • Estuary uses Snowpipe Streaming
  • Existing rows are not read back during merges
  • Latency and compute costs are reduced

Delta Updates are best suited for datasets with stable, unique keys.

6. Publish the materialization

  • Review the materialization configuration
  • Click Publish to activate the pipeline

Once active:

  • Snowflake tables are created automatically
  • New or updated SharePoint files are reflected in Snowflake
  • Data is incrementally updated based on file changes

Related resources:

Method 2: Azure Data Factory

Azure Data Factory is a common choice for moving data from SharePoint to Snowflake in organizations that are already standardized on Microsoft and the Azure ecosystem. It is often used for scheduled, batch-oriented ingestion rather than continuous data movement.

This method is a good fit when SharePoint data is exported periodically, and freshness requirements are measured in hours rather than minutes.

What it is and when it’s a good fit

Azure Data Factory is a managed ETL and data integration service that provides built-in connectors for SharePoint Online and Snowflake.

It works best when:

  • Data is extracted on a fixed schedule
  • SharePoint files are relatively static
  • The organization already uses Azure for orchestration and data movement
  • Near real-time ingestion is not required

How it works (high-level data flow)

At a high level, the pipeline follows this pattern:

  • Azure Data Factory connects to SharePoint Online using Microsoft authentication
  • Files are read from a document library or folder
  • Data is staged or transformed within ADF
  • Processed data is loaded into Snowflake using a Snowflake sink

In most deployments, data is moved in batches and written to Snowflake using COPY-based ingestion.

Pros

  • Native integration within the Microsoft and Azure ecosystem
  • Familiar tooling for teams already using Azure
  • Supports a wide range of source and destination systems

Cons

  • Primarily batch-oriented rather than real-time
  • Complex pipelines for schema changes or nested data
  • Additional operational overhead for retries, monitoring, and failure handling

Key limitations and operational considerations

  • Authentication: Requires managing Microsoft credentials for SharePoint and separate credentials for Snowflake.
  • Incremental loads: Incremental file detection is limited and often requires custom logic.
  • Schema drift: Schema changes in files may require pipeline updates and redeployment.
  • Cost management: Costs can increase with frequent runs, large files, or complex transformations.

Effort level and ideal users

  • Effort level: Medium
  • Best suited for: Data engineers and platform teams already invested in Azure Data Factory

Method 3: Custom Pipeline Using Microsoft Graph API and Snowflake

A custom pipeline using the Microsoft Graph API and Snowflake ingestion tools is a DIY approach for teams that need full control over how SharePoint data is extracted, transformed, and loaded. This method is typically used when requirements cannot be met by managed connectors or when ingestion logic must be tightly customized.

This approach trades simplicity for flexibility and requires ongoing engineering investment.

What it is and when it’s a good fit

This method involves building and operating a custom ingestion pipeline that:

  • Reads files from SharePoint using the Microsoft Graph API
  • Processes and transforms data using custom code
  • Loads data into Snowflake using COPY, Snowpipe, or Snowpipe Streaming

It is a reasonable fit when:

  • File handling or transformation logic is highly specialized
  • SharePoint access patterns are non-standard
  • The team already operates a custom orchestration infrastructure
  • There is sufficient engineering capacity to maintain the pipeline

How it works (high-level data flow)

A typical architecture looks like this:

  • A custom service or job authenticates to Microsoft Graph using OAuth
  • Files are listed and downloaded from SharePoint document libraries
  • Data is parsed and transformed in code
  • Files or rows are staged in cloud storage or memory
  • Data is loaded into Snowflake using COPY, Snowpipe, or Snowpipe Streaming
  • Orchestration is handled by tools such as Airflow, cron, or Databricks Jobs

The ingestion pattern depends heavily on how incremental changes are detected and tracked.

Pros

  • Full control over extraction and transformation logic
  • Can be adapted to unusual file formats or workflows
  • No dependency on managed ingestion platforms

Cons

  • High development and maintenance effort
  • Manual handling of retries, failures, and backpressure
  • Authentication, rate limiting, and API quotas must be managed explicitly

Key limitations and operational considerations

  • Authentication: Requires managing OAuth tokens and permissions for Microsoft Graph, as well as Snowflake credentials.
  • Incremental loads: Change detection must be implemented manually, often using file metadata, timestamps, or checkpoints.
  • Schema drift: Schema changes require code updates and redeployment.
  • Reliability and observability: Logging, monitoring, alerting, and recovery logic must be built and maintained by the team.
  • Cost predictability: Snowflake ingestion and compute costs depend on how efficiently data is staged and loaded.

Effort level and ideal users

  • Effort level: High
  • Best suited for: Senior data engineers and platform teams with custom requirements and dedicated ownership

Comparison of SharePoint to Snowflake Methods

CriteriaEstuaryAzure Data FactoryCustom Graph API Pipeline
Setup timeLowMediumHigh
Right-time capabilityYes (via Snowpipe Streaming with Delta Updates or sync schedules)Limited (batch-oriented)Custom, depends on implementation
Maintenance effortLowMediumHigh
Cost predictabilityHighMediumLow
ScalabilityHighMediumCustom
Schema handlingAutomatic schema inference and evolutionManual or pipeline-definedManual, code-driven
Incremental loadsBuilt-inLimited, often customFully custom
Governance and observabilityBuilt-in monitoring and controlsPartial, Azure-nativeFully manual

Final Recommendation

If your goal is to load SharePoint data into Snowflake with minimal setup, predictable costs, and support for incremental updates, Estuary is the most practical choice. It provides a managed SharePoint source connector, flexible Snowflake ingestion options, and built-in operational visibility.

Azure Data Factory is a reasonable alternative for teams already standardized on Azure and comfortable with batch-oriented pipelines, but it requires more configuration and ongoing management.

A custom pipeline using the Microsoft Graph API and Snowflake ingestion tools should be reserved for cases where managed solutions cannot meet specific requirements and where the team is prepared to own long-term maintenance and reliability.

FAQs

    Can SharePoint be used as a reliable data source for Snowflake?

    Yes, when data is stored in SharePoint document libraries and ingestion is designed around file-based change detection. SharePoint does not provide row-level change data capture, so ingestion tools rely on polling and file metadata such as modification timestamps to detect updates.
    Not in the database sense. SharePoint ingestion is file-centric and typically near real-time at best, depending on polling frequency and file update behavior. True sub-second streaming is not available, but low-latency ingestion can be achieved with frequent syncs and streaming-based Snowflake ingestion methods.
    Files stored in SharePoint document libraries can be loaded into Snowflake, including formats such as CSV, JSON, Avro, and compressed archives. SharePoint Lists and list item metadata require different integration approaches and are not covered by file-based ingestion methods.
    Azure Data Factory is commonly used in Microsoft-centric environments but is primarily batch-oriented and requires more configuration for incremental loads and schema changes. Estuary provides a more direct and operationally simple approach for continuously loading SharePoint files into Snowflake with built-in support for incremental updates.
    A custom pipeline is appropriate when ingestion logic is highly specialized or when managed connectors cannot meet specific requirements. In most cases, the additional development and maintenance effort outweighs the benefits unless there is a clear need for full customization.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Emily Lucek
Emily LucekTechnical Content Creator

Emily is a software engineer and technical content creator with an interest in developer education. She has experience across Developer Relations roles from her FinTech background and is always learning something new.

Related Articles

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.