
This tutorial shows how to load data from Microsoft SharePoint into Snowflake in a reliable, production-ready way.
It covers three practical methods used by data teams today, starting with Estuary, which captures files from SharePoint document libraries and writes them directly into Snowflake tables.
The goal is to help you choose the right approach based on setup time, data freshness, and ongoing operational effort.
Key Takeaways
SharePoint document libraries can be loaded into Snowflake without building custom ingestion code
Estuary provides a managed way to capture files from SharePoint and materialize them into Snowflake tables
File formats such as CSV, JSON, and compressed archives can be parsed and structured automatically
Snowflake ingestion supports incremental updates and Snowpipe Streaming for low-latency loads
Alternative native and DIY approaches exist but require more setup and ongoing operational effort
Method 1: Load Data from SharePoint to Snowflake Using Estuary (Step-by-step)
This method uses Estuary, the right-time data platform, to capture files from Microsoft SharePoint document libraries and load them into Snowflake tables. Estuary structures SharePoint files as collections and materializes those collections into Snowflake using the appropriate ingestion method based on your configuration.
Step 1: Create and Configure a SharePoint Capture
In the Estuary dashboard, use the left-hand navigation to go to Sources, then click +New Capture.
1. Select the SharePoint connector
In the connector selection screen, search for SharePoint- Select the SharePoint source connector
- Enter a capture name
- Select the data plane where the capture will run
This connector reads files from SharePoint document libraries and converts them into structured records.
2. Authenticate with Microsoft
In the authentication section, click Authenticate your Microsoft account- Complete the OAuth2 sign-in flow in the popup window
- Grant Estuary permission to access SharePoint Online
Authentication is handled using OAuth2 and is managed directly in the Estuary web application.
3. Configure the SharePoint site and folder
In the Site Configuration section, choose the URL method, which is the recommended option.
- In the Site URL field, paste the full SharePoint folder URL, for example:
plaintexthttps://contoso.sharepoint.com/sites/Marketing/Shared Documents/quarterly-reportsEstuary automatically parses the URL to determine:
- The SharePoint site
- The document library
- The folder path to monitor
The Components method is also available for advanced use cases where the site ID and drive ID are already known, but it is not required for most SharePoint to Snowflake pipelines.
4. Configure optional file filtering and parsing
In the same capture configuration screen, you can optionally refine which files are captured and how they are parsed.
- Match Keys
Use a regular expression to filter files, such as by path or extension. This regex, for example, would only capture CSV files:
plaintext.*\.csv- Parser Configuration
By default, Estuary automatically detects:
- File formats such as CSV, JSON, Avro, Protobuf, or W3C logs
- Compression such as ZIP, GZIP, or ZSTD
- If automatic detection is not sufficient, you can explicitly configure:
- File format
- Compression type
- CSV-specific settings such as delimiter, headers, encoding, quoting, and line endings
For most SharePoint document libraries, automatic parsing works without additional configuration.
5. Publish the capture
- Review the capture configuration
- Click Publish to activate the capture
Once published:
- Estuary scans the specified SharePoint folder
- New and updated files are detected during each sync
- File contents are parsed into structured records and written to Estuary collections
Step 2: Verify the Captured Collections
After publishing the capture, use the left-hand navigation to go to Collections.
- Locate the collection created by the SharePoint capture
- Open the collection to preview sample records
Each collection contains:
- Structured records derived from SharePoint files
- An inferred JSON schema
- Metadata such as publish timestamps
Verifying collections at this stage ensures the SharePoint data is correctly parsed before loading it into Snowflake.
Step 3: Create and Configure a Snowflake Materialization
In the Estuary dashboard, use the left-hand navigation to go to Destinations, then click New Materialization.
1. Select the Snowflake connector
- Search for Snowflake
- Select the Snowflake materialization connector
- Enter a materialization name
- Select the data plane
2. Configure Snowflake connection details
In the Endpoint Configuration section, provide the Snowflake connection details:
- Host: Your Snowflake account URL without the protocol (Example: orgname-accountname.snowflakecomputing.com)
- Database: Name of the target Snowflake database
- Schema: Target schema where tables will be created
- Warehouse: Virtual warehouse used for ingestion
- Role: Role assigned to the Snowflake user
Select the appropriate timestamp type mapping based on your Snowflake environment.
3. Authenticate using key-pair (JWT)
In the Credentials section:
- Enter the Snowflake username
- Paste or upload the private key associated with the user
Snowflake has deprecated simple user/password authentication and does not support it for Snowpipe Streaming.
4. Configure sync behavior and ingestion mode
Estuary supports multiple Snowflake ingestion methods, selected based on your configuration:
- Bulk COPY (default): Used for scheduled, batch-oriented ingestion. This minimizes Snowflake warehouse usage and keeps compute costs predictable.
- Snowpipe Streaming with Delta Updates: Used for near real-time ingestion. Data is written directly into Snowflake tables without waking a warehouse.
The ingestion method is controlled by:
- The materialization sync schedule
- Whether Delta Updates are enabled on individual bindings
For batch workloads, configuring the Snowflake warehouse to auto-suspend after 60 seconds helps control costs.
5. Bind collections to Snowflake tables
- Select one or more collections to materialize
- For each binding:
- Optionally adjust the target Snowflake table name
- Optionally enable Delta Updates
When Delta Updates are enabled:
- Estuary uses Snowpipe Streaming
- Existing rows are not read back during merges
- Latency and compute costs are reduced
Delta Updates are best suited for datasets with stable, unique keys.
6. Publish the materialization
- Review the materialization configuration
- Click Publish to activate the pipeline
Once active:
- Snowflake tables are created automatically
- New or updated SharePoint files are reflected in Snowflake
- Data is incrementally updated based on file changes
Related resources:
Method 2: Azure Data Factory
Azure Data Factory is a common choice for moving data from SharePoint to Snowflake in organizations that are already standardized on Microsoft and the Azure ecosystem. It is often used for scheduled, batch-oriented ingestion rather than continuous data movement.
This method is a good fit when SharePoint data is exported periodically, and freshness requirements are measured in hours rather than minutes.
What it is and when it’s a good fit
Azure Data Factory is a managed ETL and data integration service that provides built-in connectors for SharePoint Online and Snowflake.
It works best when:
- Data is extracted on a fixed schedule
- SharePoint files are relatively static
- The organization already uses Azure for orchestration and data movement
- Near real-time ingestion is not required
How it works (high-level data flow)
At a high level, the pipeline follows this pattern:
- Azure Data Factory connects to SharePoint Online using Microsoft authentication
- Files are read from a document library or folder
- Data is staged or transformed within ADF
- Processed data is loaded into Snowflake using a Snowflake sink
In most deployments, data is moved in batches and written to Snowflake using COPY-based ingestion.
Pros
- Native integration within the Microsoft and Azure ecosystem
- Familiar tooling for teams already using Azure
- Supports a wide range of source and destination systems
Cons
- Primarily batch-oriented rather than real-time
- Complex pipelines for schema changes or nested data
- Additional operational overhead for retries, monitoring, and failure handling
Key limitations and operational considerations
- Authentication: Requires managing Microsoft credentials for SharePoint and separate credentials for Snowflake.
- Incremental loads: Incremental file detection is limited and often requires custom logic.
- Schema drift: Schema changes in files may require pipeline updates and redeployment.
- Cost management: Costs can increase with frequent runs, large files, or complex transformations.
Effort level and ideal users
- Effort level: Medium
- Best suited for: Data engineers and platform teams already invested in Azure Data Factory
Method 3: Custom Pipeline Using Microsoft Graph API and Snowflake
A custom pipeline using the Microsoft Graph API and Snowflake ingestion tools is a DIY approach for teams that need full control over how SharePoint data is extracted, transformed, and loaded. This method is typically used when requirements cannot be met by managed connectors or when ingestion logic must be tightly customized.
This approach trades simplicity for flexibility and requires ongoing engineering investment.
What it is and when it’s a good fit
This method involves building and operating a custom ingestion pipeline that:
- Reads files from SharePoint using the Microsoft Graph API
- Processes and transforms data using custom code
- Loads data into Snowflake using COPY, Snowpipe, or Snowpipe Streaming
It is a reasonable fit when:
- File handling or transformation logic is highly specialized
- SharePoint access patterns are non-standard
- The team already operates a custom orchestration infrastructure
- There is sufficient engineering capacity to maintain the pipeline
How it works (high-level data flow)
A typical architecture looks like this:
- A custom service or job authenticates to Microsoft Graph using OAuth
- Files are listed and downloaded from SharePoint document libraries
- Data is parsed and transformed in code
- Files or rows are staged in cloud storage or memory
- Data is loaded into Snowflake using COPY, Snowpipe, or Snowpipe Streaming
- Orchestration is handled by tools such as Airflow, cron, or Databricks Jobs
The ingestion pattern depends heavily on how incremental changes are detected and tracked.
Pros
- Full control over extraction and transformation logic
- Can be adapted to unusual file formats or workflows
- No dependency on managed ingestion platforms
Cons
- High development and maintenance effort
- Manual handling of retries, failures, and backpressure
- Authentication, rate limiting, and API quotas must be managed explicitly
Key limitations and operational considerations
- Authentication: Requires managing OAuth tokens and permissions for Microsoft Graph, as well as Snowflake credentials.
- Incremental loads: Change detection must be implemented manually, often using file metadata, timestamps, or checkpoints.
- Schema drift: Schema changes require code updates and redeployment.
- Reliability and observability: Logging, monitoring, alerting, and recovery logic must be built and maintained by the team.
- Cost predictability: Snowflake ingestion and compute costs depend on how efficiently data is staged and loaded.
Effort level and ideal users
- Effort level: High
- Best suited for: Senior data engineers and platform teams with custom requirements and dedicated ownership
Comparison of SharePoint to Snowflake Methods
| Criteria | Estuary | Azure Data Factory | Custom Graph API Pipeline |
|---|---|---|---|
| Setup time | Low | Medium | High |
| Right-time capability | Yes (via Snowpipe Streaming with Delta Updates or sync schedules) | Limited (batch-oriented) | Custom, depends on implementation |
| Maintenance effort | Low | Medium | High |
| Cost predictability | High | Medium | Low |
| Scalability | High | Medium | Custom |
| Schema handling | Automatic schema inference and evolution | Manual or pipeline-defined | Manual, code-driven |
| Incremental loads | Built-in | Limited, often custom | Fully custom |
| Governance and observability | Built-in monitoring and controls | Partial, Azure-native | Fully manual |
Final Recommendation
If your goal is to load SharePoint data into Snowflake with minimal setup, predictable costs, and support for incremental updates, Estuary is the most practical choice. It provides a managed SharePoint source connector, flexible Snowflake ingestion options, and built-in operational visibility.
Azure Data Factory is a reasonable alternative for teams already standardized on Azure and comfortable with batch-oriented pipelines, but it requires more configuration and ongoing management.
A custom pipeline using the Microsoft Graph API and Snowflake ingestion tools should be reserved for cases where managed solutions cannot meet specific requirements and where the team is prepared to own long-term maintenance and reliability.
FAQs
Does SharePoint support real-time data ingestion into Snowflake?
What types of SharePoint data can be loaded into Snowflake?
Is Azure Data Factory better than Estuary for SharePoint to Snowflake ingestion?
When should I build a custom SharePoint to Snowflake pipeline?

About the author
Emily is a software engineer and technical content creator with an interest in developer education. She has experience across Developer Relations roles from her FinTech background and is always learning something new.

















