Master your data journey, watch, learn, build
Dive into our library of real-time data integration insights and tutorials. Whether you're starting out or scaling up, Estuary empowers your data-driven success.

Right-time Data Integration for Snowflake
Hosted by: Dani & Ben Rogojan (Seattle Data Guy) Choosing the right ingestion strategy for Snowflake can dramatically influence your latency, cost, and operational overhead. In this live 1-hour session, Dani and Ben Rogojan will break down the modern ingestion landscape and help you understand when batch, micro-batch, serverless, or real-time streaming pipelines make the most sense. This webinar is designed for data engineers, architects, and Snowflake users who want a clearer framework for making ingestion decisions, without the guesswork.

Estuary 101: How To Build Right-Time Data Pipelines
Join hosts Dani and Zulf for a fast-paced walkthrough of how to design and ship right-time data pipelines with Estuary. In this session, you’ll get: - Context: What “right-time” really means, where Estuary fits among batch vs. streaming and managed vs. self-hosted options, and why unified ingestion reduces cost and complexity. - Live End-to-End Demo: Connect CDC sources, apply declarative transformations, and materialize data simultaneously into a warehouse, analytical engines, and object storage—plus a look at observability, error recovery, and real-world scenarios like schema drift and backfills. - Live Q&A: Ask about your specific stack, pipeline designs, and how to scale Estuary for enterprise workloads. Perfect for data and analytics engineers, architects, and platform owners who want fresher data with fewer moving parts.

How to Stream Data into Snowflake
Ingest data into a Snowflake warehouse using real-time Snowpipe Streaming or using batch COPY INTO commands. Estuary makes Snowflake integration simple with pre-built no-code connectors. Following along? Find the copy/pasteable commands in Estuary’s Snowflake docs: https://docs.estuary.dev/reference/Connectors/materialization-connectors/Snowflake/ - Set up your first data pipeline for free at Estuary: https://dashboard.estuary.dev/register/?utm_source=youtube&utm_medium=social&utm_campaign=snowflake_ingestion - Learn more about Estuary’s Snowflake capabilities: https://estuary.dev/solutions/technology/real-time-snowflake-streaming/ - Read the complete guide to Snowpipe Streaming: https://estuary.dev/blog/snowpipe-streaming-fast-snowflake-ingestion/ - Discover how Snowflake fared in Estuary’s Data Warehouse Benchmark: https://estuary.dev/data-warehouse-benchmark-report/ - Download Snowflake Ingestion Playbook: https://estuary.dev/snowflake-ingestion-whitepaper/ FAQ 1. What is the fastest way to load data into Snowflake? Snowpipe Streaming with row-based ingestion. In Estuary, you can enable it per table using Delta Updates. 2. Why use key pair authentication for Snowflake? It provides strong security, short-lived tokens, and is Snowflake’s recommended approach for service integrations like Estuary. 3. Can I mix real-time and batch ingestion in the same pipeline? Yes. With Estuary’s Snowflake connector, you can run some tables in batch (COPY INTO or Snowpipe) and others in real time with Snowpipe Streaming. Media resources used in this video are from Pexels, Canva, and the YouTube Studio Audio Library. 0:00 Introduction 1:05 Snowflake concerns 1:51 Ingestion options 3:23 Beginning the demo 3:47 Create Snowflake resources 4:28 User auth setup 5:17 Estuary connector config 6:44 Customization options 8:07 Wrapping up

The Rise of Lake Houses Are They the Future of Data Warehousing #podcast #interview #seattledataguy

What’s Next for Data Warehouses? Lessons from Our Benchmark and Emerging Trends
Dani and Ben talks about key findings on performance ceilings, cost traps, and failure modes, and explore the major trends reshaping data warehouse architecture, including: - Separation of Compute & Storage – How Snowflake Gen2, Databricks serverless, and open table formats like Iceberg are changing the game. - Lakehouse Reality Check: What’s working for teams adopting Iceberg, schema evolution patterns, and lake-native pipelines. - Flexibility Over Centralization: Moving beyond “one warehouse to rule them all.

Capture Data from Oracle Using CDC (Docker Demo)
Don’t silo your data in Oracle: learn how to replicate it to a destination of your choice with CDC. We’ll cover archive log configuration and Estuary setup with a demo Oracle instance. Follow along! This example project is available at: https://github.com/estuary/examples/tree/main/oracle-capture - Try it out for free at Estuary: https://dashboard.estuary.dev/register/?utm_source=youtube&utm_medium=social&utm_campaign=oracle_capture - Reference Estuary’s Oracle docs, including instructions for non-container databases: https://docs.estuary.dev/reference/Connectors/capture-connectors/OracleDB/ - Have questions? Contact us on Slack: https://go.estuary.dev/slack Media resources used in this video are from Pexels, Canva, and the YouTube Studio Audio Library. 0:00 Introduction 0:40 Oracle & CDC 2:30 Demo: Project overview 5:42 Demo: Run container 7:22 Demo: Estuary setup 8:50 Wrapping up

Stream CRM Data from HubSpot to MotherDuck
Need blazing-fast analytics to keep up with your customers? Try sending your HubSpot data to MotherDuck: Estuary makes the process simple and straightforward. Register for free at: https://dashboard.estuary.dev/register/?utm_source=youtube&utm_medium=social&utm_campaign=hubspot_motherduck Ready to dive in deeper? Try these resources: 📄 Hubspot capture connector docs: https://docs.estuary.dev/reference/Connectors/capture-connectors/hubspot/ 🐤 MotherDuck materialization docs: https://docs.estuary.dev/reference/Connectors/materialization-connectors/motherduck/ 💬 Our Slack community: https://go.estuary.dev/slack Music sourced from the YouTube Studio Audio Library 0:00 Introduction 0:20 Starting the pipeline 1:01 (Optional) Create HubSpot access token 1:39 Finish capture 2:15 MotherDuck materialization 2:51 Create staging bucket 4:40 MotherDuck credentials 5:20 Complete pipeline & wrap up

Dekaf: How to Use Kafka Minus the Kafka
Have you ever wanted Kafka’s real-time pub/sub benefits without implementing and maintaining a whole Kafka ecosystem yourself? Learn how with Dekaf. We’ll cover some Kafka basics to help explain how Estuary’s Kafka API compatibility layer fits seamlessly into a modern data architecture. - Register for a free Estuary account: https://dashboard.estuary.dev/register/?utm_source=youtube&utm_medium=social&utm_campaign=dekaf_video - Learn more about Dekaf: https://docs.estuary.dev/reference/Connectors/dekaf/ - Find the example kcat command: https://docs.estuary.dev/guides/dekaf_reading_collections_from_kafka/#2-set-up-your-kafka-client - Join us on Slack: https://go.estuary.dev/slack Media resources used in this video are from Pexels, Canva, and the YouTube Studio Audio Library. 0:00 Introduction 0:30 What is Kafka? 1:04 The Kafka Ecosystem 2:39 Integrating with Kafka Consumers... 3:26 ...using Dekaf 4:12 Dekaf Setup 6:16 kcat Test 6:45 Final Thoughts

Save Webhook Data to Databricks in Real Time
Learn how to stream incoming webhook data to Databricks without setting up and maintaining your own server for webhook captures. Follow along with our 3-minute demo and try out Estuary Flow for free → https://dashboard.estuary.dev/register Learn more from our: - Website: https://estuary.dev/ - Webhook capture docs: https://docs.estuary.dev/reference/Connectors/capture-connectors/http-ingest/ - Databricks materialization docs: https://docs.estuary.dev/reference/Connectors/materialization-connectors/databricks/ - Blog article on webhook setup: https://estuary.dev/blog/webhook-setup/ 0:00 Intro 0:19 Set up webhook capture 1:19 Configure webhook in Square 2:08 Create Databricks materialization 2:43 Outro

Create a Webhook-to-Snowflake Data Pipeline
Create a complete data pipeline in 3 minutes that captures Square (or any other platform's) webhooks and materializes to Snowflake. With Estuary Flow, you can create endpoints to receive webhook data without setting up and maintaining your own server. Try it out for free at → https://dashboard.estuary.dev/register Ready for more? - See our site: https://estuary.dev/ - Learn more about webhooks: https://estuary.dev/blog/webhook-setup/ - Read our webhook capture docs: https://docs.estuary.dev/reference/Connectors/capture-connectors/http-ingest/ - Or our Snowflake materialization docs: https://docs.estuary.dev/reference/Connectors/materialization-connectors/Snowflake/ 0:00 Intro 0:19 Set up webhook capture 1:17 Configure webhook in Square 2:06 Create Snowflake materialization 2:48 Outro

Estuary | The Right Time Data Platform
Welcome to Estuary, the Right Time Data Platform built for modern data teams. With Estuary, you can move and transform data between hundreds of systems at sub second latency or in batch, depending on your business needs. • Capture data from source systems using pre built, no code connectors. • Automatically infer schemas and manage both real time and historical events in collections. • Materialize your data to any destination with ease and flexibility. • Choose your deployment model: fully SaaS, Bring Your Own Cloud, or private deployment with enterprise level security. Start streaming an ocean of data and get going today: 🌊 https://dashboard.estuary.dev/register/?utm_source=youtube&utm_medium=social&utm_campaign=overview_video Learn more: 🌐 On our site: https://www.estuary.dev/?utm_source=youtube&utm_medium=social&utm_campaign=flow_overview 📚 In our docs: https://docs.estuary.dev/?utm_source=youtube&utm_medium=social&utm_campaign=flow_overview Connect with us: 💬 On Slack: https://go.estuary.dev/slack 🧑💻 In GitHub: https://github.com/estuary ℹ️ On LinkedIn: https://www.linkedin.com/company/estuary-tech/ #righttimedata #datapipelines #streamingdata #realtimeanalytics #CDC #dataengineering #Estuary

Capture NetSuite Data Using SuiteAnalytics and Estuary
Learn how to transfer your NetSuite data in minutes using SuiteAnalytics Connect and Estuary. We demo how to set up your capture step-by-step, covering all the resources you'll need to get your data flowing. Once connected, Estuary Flow ingests your NetSuite data in real time — ready to be streamed into cloud warehouses like Snowflake, BigQuery, and more. Whether you're building a NetSuite to Snowflake pipeline, syncing NetSuite to BigQuery, or integrating with other analytics tools, Estuary lets you do it in minutes. Looking for more? - Start building pipelines for free at: https://dashboard.estuary.dev/register - See Estuary's NetSuite SuiteAnalytics capture connector docs: https://docs.estuary.dev/reference/Connectors/capture-connectors/netsuite-suiteanalytics/ - View materialization options for your captured data: https://docs.estuary.dev/reference/Connectors/materialization-connectors/ - Chat with us in Slack: https://go.estuary.dev/slack 0:00 Intro 0:18 NetSuite setup 4:03 Creating the Estuary capture 6:10 Outro

Unify Your Data in Microsoft Fabric with Estuary
Want to get your data into Microsoft Fabric—fast and without writing code? Discover what unified data can do with a Microsoft Fabric integration. We’ll cover what makes this relatively new data platform unique and how you can enhance its capabilities further using Estuary. A step-by-step demo walks through Fabric warehouse connector setup in Estuary so you can get your data flowing. Interested in more? - Register for a free Estuary account: https://dashboard.estuary.dev/register - Learn more about Microsoft Fabric: https://estuary.dev/blog/what-is-microsoft-fabric/ - Find source connectors to go with your Fabric destination: https://docs.estuary.dev/reference/Connectors/capture-connectors/ - Join us on Slack: https://go.estuary.dev/slack Media resources used in this video are from Pexels and the YouTube Studio Audio Library. 0:00 Introduction 0:26 Microsoft Fabric 1:33 Covering gaps with Estuary 2:33 Beginning connector creation 3:23 Creating a warehouse 3:59 Configuring a service principal 5:59 Creating a storage account 6:55 Wrapping up the connector 7:33 Outro

How to Stream Data to MotherDuck with Estuary (Step-by-Step)
Learn how to load your data into MotherDuck—cloud-based DuckDB—with Estuary. We’ll cover a little about what makes DuckDB unique before diving into a step-by-step demo. Whether you're building a real-time pipeline from NetSuite, Snowflake, BigQuery, PostgreSQL, or MongoDB to MotherDuck, Estuary lets you do it in minutes — no code required. Find more: - Register at Estuary: https://dashboard.estuary.dev/register - Sign up with MotherDuck: https://app.motherduck.com/?auth_flow=signup - Read Estuary’s docs on MotherDuck: https://docs.estuary.dev/reference/Connectors/materialization-connectors/motherduck/ - Follow along with MotherDuck’s tutorial on working with Estuary: https://motherduck.com/blog/streaming-data-to-motherduck/ Media resources used in this video are from Pexels and the YouTube Studio Audio Library. 0:00 Intro 1:01 DuckDB's features 1:45 MotherDuck 2:16 Connector demo with Estuary 2:57 Setting up AWS resources 3:46 Setting up MotherDuck 4:23 Publishing the connector 4:53 Outro

Stream Data to Apache Iceberg with Estuary
Learn about the Apache Iceberg table format, why it’s essential for organizing your data lake, and how to load data into Iceberg using Estuary. We’ll cover a brief intro to Iceberg before demoing the connector setup with Estuary, Amazon S3, and AWS Glue for real-time and batch data integration. With Estuary, you can stream structured or unstructured data directly into Iceberg tables — whether your source is PostgreSQL, Kafka, Snowflake, MongoDB, or many others — making it easy to build a scalable, query-ready data lakehouse architecture. Find more at Estuary’s: - Website: https://estuary.dev/ - Docs: https://docs.estuary.dev/ - Introduction to Iceberg: https://estuary.dev/apache-iceberg-tutorial-guide/ - Iceberg connector documentation: https://docs.estuary.dev/reference/Connectors/materialization-connectors/amazon-s3-iceberg/ #ApacheIceberg #datalakehouse Media resources used in this video are from Pexels and the YouTube Studio Audio Library. 0:00 Intro 1:00 What is Iceberg? 2:30 Beginning connector setup in Estuary 3:17 AWS resources 5:00 Additional config and catalogs 6:05 Wrapping up connector creation 6:36 Review and outro

Streaming Data Lakehouse Tutorial: MongoDB to Apache Iceberg
Learn how to connect MongoDB to Apache Iceberg in Iceberg table format using Estuary Flow. In this step-by-step demo, we show you how to: 1. Set up a MongoDB source and configure secure connections. 2. Create real-time pipelines to load data into Amazon S3. 3. Leverage the AWS S3 Iceberg Connector with AWS Glue for table cataloging. Estuary Flow simplifies real-time data integration with powerful features like advanced security connections, automated materialization, and streamlined pipeline management. Whether you're handling transactional data or syncing complex data streams, Estuary Flow has you covered. 👉 Try Estuary Flow: https://dashboard.estuary.dev/register 👉 Read the Documentation: https://docs.estuary.dev/ #MongoDBtoIceberg 0:00 - Introduction: Overview of the demo and Estuary Flow. 0:07 - Step 1: Setting Up MongoDB Source: Configuring MongoDB as the data source. 0:44 - Step 2: Reviewing Collections: Selecting collections to sync. 1:03 - Step 3: Setting Up S3 Destination: Configuring the AWS S3 Iceberg connector. 1:37 - Step 4: Testing and Publishing Pipeline: Testing the connection and publishing the pipeline. 2:07 - Final Verification: Verifying MongoDB data in S3 as Iceberg tables.

Integrate Amazon RDS and Snowflake with Estuary
Learn how to integrate Amazon RDS and Snowflake effortlessly using Estuary. In this quick tutorial, we walk you through setting up real-time data movement from an Amazon RDS PostgreSQL database to Snowflake. Explore how to capture live data, configure collections, and materialize data into Snowflake—all in just a few simple steps. Empower your team with real-time analytics using Estuary. Estuary Tutorial: https://estuary.dev/blog/tutorial/ 0:00 – Introduction: Real-time data integration with Estuary 0:11 – Setting up PostgreSQL (RDS) as a source 0:27 – Configuring the Estuary source connector 0:51 – Testing and publishing the source 1:09 – Creating a collection for real-time data 1:36 – Materializing data to Snowflake 2:01 – Testing real-time data ingestion

Estuary Podcast Series: Building Data Pipelines, Analytics, Ad Tech, and Startups
Estuary Podcast Series: Building Data Pipelines, Analytics, Ad Tech, and Startups Chase Cottle Andrew James Co-founder and CTO Co-founder and CEO Eyeball Division Eyeball Division

Real-time Data Products with Estuary - Ad Performance
In this detailed demo, Dani shows you how to build a real-time data product using Estuary. Learn how to capture and process data from a PostgreSQL database using Change Data Capture (CDC) and stream it into Snowflake for real-time ad performance calculations. This tutorial covers how to join and transform data using Estuary’s derivations and materialize the final output into Snowflake. Follow along as we work with ad clicks and impressions to perform transformations in real-time! 00:00 - Introduction: Real-Time Data Product with Postgres and Snowflake 01:30 - Setting Up Postgres CDC with Estuary Flow 04:45 - Defining Derivations for Real-Time Transformations 09:45 - Materializing Data into Snowflake 12:03 - Real-Time Ad Performance Calculation in Snowflake 12:46 - Conclusion: Wrap-up and Further Resources

PostgreSQL to Iceberg - Streaming Lakehouse Foundations
Stream Real-Time Data from Postgres to Iceberg with Change Data Capture and Estuary Flow In this step-by-step tutorial, we demonstrate how to set up and stream real-time data from a PostgreSQL database into Iceberg tables using change data capture (CDC) with Estuary Flow. Learn how to capture, ingest, and materialize data using Estuary Flow's seamless integration. This demo uses a sales database to showcase how changes in a PostgreSQL table are tracked and replicated into an Iceberg table stored in AWS S3. Check out Estuary Flow's Iceberg integration: https://estuary.dev/destination/s3-iceberg/ Join Estuary Flow's community Slack: https://estuary-dev.slack.com/join/shared_invite/zt-86nal6yr-VPbv~YfZE9Q~6Zl~gmZdFQ#/shared-invite/email 00:00 - Introduction: Streaming Data from Postgres to Iceberg 00:18 - Postgres Sales Database Overview 01:08 - Starting Change Data Capture (CDC) with Estuary Flow 02:09 - Materializing Data into Apache Iceberg 04:17 - Backfilling Data into Iceberg 05:21 - Querying Iceberg Tables with Python 06:10 - Conclusion: Demo Recap

How-to: Change Data Capture for Neon PostgreSQL with Estuary

Change Data Capture for PostgreSQL with Estuary
In this quick tutorial, Dani demonstrates how to effortlessly set up a PostgreSQL Change Data Capture (CDC) pipeline using Estuary in less than a minute. Watch as he works with a live sales table, showing you just how simple it is to connect your Postgres database and start replicating data in real time. You'll also see how Estuary handles schema capturing and backfills your existing data, making real-time data integration both fast and efficient. #PostgresCDC #changedatacapture Try Estuary today for real-time data pipelines and seamless integration with your databases! - Sign up for a free account: https://dashboard.estuary.dev/register - Join our Slack community: https://estuary-dev.slack.com/join/shared_invite/zt-86nal6yr-VPbv~YfZE9Q~6Zl~gmZdFQ#/shared-invite/email - Make sure to check out our Postgres CDC guide: https://estuary.dev/the-complete-change-data-capture-guide-for-postgresql/ Key things covered: 0:00 – Introduction: Setting up PostgreSQL CDC Pipeline 0:09 – Sales Table Example and Real-Time Updates 0:21 – Creating a Postgres Capture in Estuary 0:33 – Verifying the Sales Table and Schema 0:44 – Backfilling Data and Real-Time Replication

Stream Real‑Time Data to Databricks with Estuary
Learn how to stream real-time data from PostgreSQL into Databricks using Estuary — no code, no maintenance. In this demo, Dani walks through: • Setting up a Databricks SQL Warehouse and generating a personal access token • Capturing the users and transactions tables from PostgreSQL in Estuary • Materializing those tables directly into Databricks using the built-in connector • Monitoring live data replication and verifying the results in Databricks SQL Warehouse Highlights of Estuary: • Real-time change data capture with millisecond latency • Native support for Databricks Unity Catalog and Delta Lake • Zero-code pipeline setup with automatic backfill and continuous sync Why it matters Streaming live data into Databricks unlocks fresh analytics, real‑time dashboards, and feeding ML models with up‑to‑date data — all without complex ETL or scripting 🔗 Learn more & get started: • Official Estuary Flow guide: https://estuary.dev/real-time-fraud-detection-databricks/ • https://estuary.dev/blog/load-data-into-databricks/ • Start building for free at: https://dashboard.estuary.dev/register If you have questions or need help, jump into our community Slack or check the docs. #databrickstutorial #databricks 00:00 - Introduction 00:49 - Materializing Data to Databricks 01:45 - Verifying Data in Databricks 02:04 - Conclusion

Real-time CDC with MongoDB and Estuary in 3 minutes
Build a Real-Time CDC Pipeline from MongoDB using Estuary: This tutorial demonstrates how to create a real-time change data capture (CDC) pipeline from MongoDB using Estuary. It covers setting up MongoDB Atlas, configuring Estuary, and monitoring data replication in real-time. #MongodbCDC #Changedatacapture Start building for free at: https://dashboard.estuary.dev/register Blog Post MongoDB CDC: https://estuary.dev/mongodb-change-data-capture/ 0:00 – Introduction: Real-Time CDC Pipeline from MongoDB using Estuary 0:07 – Provisioning MongoDB Atlas 0:56 – Creating a Real-Time CDC Pipeline in Estuary 1:17 – Discovering Database Objects for Replication 1:47 – Saving and Publishing the CDC Pipeline 2:36 – Inserting a New Record in MongoDB 3:01 – Verifying Record Update in Estuary

MongoDB to Snowflake in real-time (no Debezium)
In this video, Jeff from Estuary walks you through how to move data from MongoDB to Snowflake using Estuary Flow, a real-time ETL platform. Learn the key benefits of using Estuary, including low-latency Change Data Capture (CDC) and automatic unpacking of nested documents. You'll also see a step-by-step guide to setting up a MongoDB Atlas database and creating a real-time data pipeline with Estuary Flow. Key features covered: - Real-time data replication from MongoDB to Snowflake - Low-latency data movement and automatic flattening of nested documents - Backfilling data and setting up materializations to Snowflake in just a few clicks #MongodbtoSnowflake #changedatacapture If you have any questions, feel free to join our community Slack. Start building real-time data pipelines with Estuary today! Sign up for a free account: https://dashboard.estuary.dev/register Join our Slack community: https://estuary-dev.slack.com/join/shared_invite/zt-86nal6yr-VPbv~YfZE9Q~6Zl~gmZdFQ#/shared-invite/email Blog: https://estuary.dev/mongodb-to-snowflake/ 0:00 – Introduction: Moving Data to Snowflake with Estuary 0:12 – Key Benefits of Using Estuary: Real-Time Data Integration 1:18 – Automatic Flattening of Nested Data 2:08 – Testing Connection to the MongoDB Source 2:25 – Saving and Publishing the Real-Time Pipeline 2:54 – Sending Data to Snowflake and Other Destinations 3:12 – Real-Time Backfill and Data Materialization to Snowflake

Custom ChatGPT Solution Explained in 3 Minutes
ChatGPT? OpenAI? Pinecone? Langchain? Estuary Flow? How do the pieces fit together? https://docs.estuary.dev/reference/Connectors/materialization-connectors/pinecone/ Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ #data #dataengineering #datapipeline #streaming

How is Idempotency Implemented in Streaming Systems?
In the last video of our Streaming Data Q&A series, Estuary VP of Engineering Phil Fried is going to answer: 0:00 Intro 00:47 How is idempotency implemented in streaming systems? Full video of the podcast is available here: https://youtu.be/pOqQ-0cRWKU Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ #data #dataengineering #datapipeline #streaming

What data tech stack and how to get started in streaming?
In the third video of our Streaming Data Q&A series, Estuary CEO and Co-Founder David Yaffe is going to answer: 0:00 Intro 00:51 What tech stack would you recommend to non-data engineers? 2:24 How would you get started into streaming? Full video of the podcast is available here: https://www.youtube.com/live/oLWEbBumifU?feature=share Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ #data #dataengineering #datapipeline #streaming

What are write-ahead logs and what are the gotchas?
In the second video of our Streaming Data Q&A series, Estuary CTO and Co-Founder Johnny Graettinger is going to answer: What are write-ahead logs and what are the gotchas? Full video of the podcast is available here: https://www.youtube.com/live/xbvFTx9eCHQ?feature=share Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ #data #dataengineering #datapipeline

How is streaming different from batch processing? Incremental processing?
In the first video of our Streaming Data Q&A series, Estuary VP of Engineering Phil Fried is going to answer the following questions: 0:00 Intro 0:49 What is batch processing and in what ways is streaming different from batch? 6:10 What are the challenges of incremental processing in batch jobs? Full video is available at The Geek Narrator's podcast here: https://youtu.be/pOqQ-0cRWKU Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ #data #dataengineering #datapipeline

Ingesting Data into BigQuery: How to set up a Materialization in Estuary Flow
Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ BigQuery blog: https://estuary.dev/cloud-sql-to-bigquery/ BigQuery sandbox: https://estuary.dev/bigquery-sandbox/ ________________________________________________________________________ Building a Pipeline With Estuary Flow. Estuary Flow is a real-time data integration platform that allows you to connect Cloud SQL to BigQuery and other data sources. Estuary is streaming native and has an intuitive no-code UI that’s quick to use once your data systems meet the prerequisites. Like Dataflow, it's also highly scaleable and hands-off once the initial setup is done. To connect Cloud SQL to BigQuery using Estuary, you'll need to meet the following requirements: Google Cloud SQL instance: You need to have a running Cloud SQL instance that contains the data you want to transfer to BigQuery. Allow connections from Estuary Flow: You'll need to enable public IP on your database and add the IP address of Estuary Flow (currently 34.121.207.128) as an authorized IP address. Depending on whether your Cloud SQL instance is MySQL, Postgres, or SQL Server, you’ll have to meet a few more requirements to prepare your database. See the guides below: MySQL Postgres SQL Server A Google Cloud Storage bucket in the same region as the BigQuery dataset. A Google Service account with roles/bigquery.dataEditor, roles/bigquery.jobUser, and roles/storage.objectAdmin; and a service account key generated. See this guide for help. Once you've met these requirements, you can follow these steps to connect Cloud SQL to BigQuery using Estuary Flow: Log in to your Estuary account, or sign up to get started for free. Go to the create a new capture page of the Estuary web app and select either the MySQL, PostgreSQL, or SQL Server connector, depending on your Cloud SQL database type. Add a unique name for the capture. Provide the Cloud SQL server address, database username (this should be “flow_capture” if you followed the prerequisite steps), and a password. Click the Next button. Flow lists all the tables in your database, which it will convert into Flow data collections described by JSON schema. You can remove any tables you don’t want to capture. Click Save and Publish. On the dialog box showing your capture was successful, click the Materialize Collections button to continue. Choose the BigQuery connector. Add a unique name for the materialization. Provide the following details for your BigQuery dataset: Google Cloud project ID Service account JSON credentials (which you generated per the prerequisites) The project’s Google Cloud Region Dataset name Staging Google Cloud Storage bucket name Scroll down to the Collection Selector. Each table you just captured from Cloud SQL will be mapped to a new table in BigQuery. Provide a name for each (you might choose to use the same names). Optionally, you can modify the collection's schema, determining how it'll be mapped to BigQuery, but that shouldn't be necessary: Flow will output the data in a queryable format in BigQuery tables. Click Next. Click Save and Publish. All historical data from your Cloud SQL database will be copied to BigQuery. Any new data that appears in Cloud SQL will also be copied to BigQuery in real-time. Along the way, data will be cleaned and re-formatted to adhere to BigQuery's data types and valid schema options. Using this method requires minimal technical expertise, and your data pipeline is backed up securely with schema validation and exactly-once semantics. Additionally, a single data pipeline can sync many (or all) tables in your Cloud SQL database into equivalent BigQuery tables. #bigquery #data #dataengineering #datapipeline

Stateful Streaming with Estuary
Hear Estuary Co-Founder David Yaffe talk about stateful streaming's challenges and advances nowadays with the help of managed solutions like Estuary Flow. Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ #data #dataengineering #sql #sqlqueries

Google BigQuery Explained in 3 Minutes: An Overview
Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ 0:00 Intro 0:12 What is BigQuery? 0:21 Petabytes of data 0:36 Characteristics 0:48 When did BigQuery come about? 1:00 Benefits 1:34 Architecture 2:14 Under the hood 2:49 Ingesting data References: https://estuary.dev/cloud-sql-to-bigquery/ https://en.wikipedia.org/wiki/BigQuery https://cloud.google.com/blog/products/bigquery/bigquery-under-the-hood https://medium.com/google-cloud/bigquery-explained-overview-357055ecfda3 #bigquery What is BigQuery? BigQuery is Google's fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. How much is petabytes? One Petabyte is the equivalent of 20 million tall filing cabinets, or 500 billion pages of standard printed text. It’s A LOT of data! Also, BigQuery is a Platform as a Service (PaaS) that supports querying using dialects of SQL. It also has built-in machine learning capabilities, which everyone is going after these days. When did BigQuery come about? BigQuery was announced in May 2010 and made generally available in November 2011.[1] So it’s been around for over a decade! What are some benefits of using BigQuery? There are many, but here are just a few: With BigQuery, you no longer have to provision and forecast compute and storage resources beforehand. BigQuery allocates all the resources based on usage, dynamically. Also, BigQuery provides super fast analytics on a petabyte scale through its unique capabilities and architecture, which we’ll talk about. And, since BigQuery uses a columnar data store, you can enjoy the highest data compression with minimized data scanning in the usual data warehouse deployments. What is the BigQuery Architecture like? BigQuery’s serverless architecture decouples storage and compute and allows them to scale independently on demand. This structure offers both flexibility and cost controls for users because they don’t need to keep their expensive compute resources up and running all the time. This is very different from traditional node-based cloud data warehouse solutions or on-premise MPP systems. This approach also allows users of any size to bring their data into the data warehouse and start analyzing their data using Standard SQL, without worrying about database operations and system engineering. Under the hood, BigQuery employs a vast set of multi-tenant services driven by low-level Google infrastructure technologies like Dremel, Colossus, Jupiter and Borg. Compute is Dremel, a large multi-tenant cluster that executes SQL queries. Storage is Colossus, Google’s global storage system. Compute and storage talk to each other through the petabit Jupiter network. BigQuery is orchestrated via Borg, Google’s precursor to Kubernetes. Ingesting Data Now, at some point, you’ll probably need to ingest data into or from BigQuery because every organization today relies on multiple data sources, databases, data warehouses, so it’s likely that BigQuery alone cannot contain all of your data needs and pipelines. The good news is: BigQuery supports several ways to ingest data into its managed storage. The specific ingestion method depends on the origin of the data. If your data sources are in GCP, then some of them support direct exports to BigQuery. However, if your data sources are outside of GCP. Or, you don’t want to manually handle exports, there are several third-party ETL solutions in the market that can help you ingest data to and from BigQuery fairly easily. That’s what I’ll be going over in the next video. Stay tuned!

The Customer Just Wants Data Joined - Stream Processing Viewpoint
When people use a stream processing platform, what do they really care about? What's an example of when you may use batch and streaming together? Hear Estuary's Co-Founder David Yaffe's take on it. Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ #data #dataengineering #datapipeline

A Continuous Slack to ChatGPT to Google Sheets Pipeline
Original blog article: https://estuary.dev/gpt-real-time-pipeline/ Code repo: https://github.com/jgraettinger/slack-thread-summaries/tree/main Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ GPT Playlist: https://www.youtube.com/playlist?list=PLWF-evbnGsVxvIjKsL_No_uFSs5XA-SV1 Chapters: 0:00 Intro 0:23 Blog 1:01 Demo 1:46 Context is everything 3:15 Solution 4:31 Slack Capture 5:40 Transformation 6:36 Properties of Derivation 7:37 Materialize 8:53 Other use cases

Build a custom, always-on ChatGPT in 10 minutes, Productionize your AI pipelines
This video teaches how to build your own custom, real-time, free ChatGPT in 10 minutes, using Estuary Flow, Pinecone, OpenAI, Langchain. Scripts used in the video can be found here: https://www.estuary.dev/chatgpt-custom-data/ Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ 0:00 Intro 0:33 ChatGPT Gaps 1:42 Solution 2:15 Prerequisites 3:20 Use Case Ideas 4:57 Set up input 5:24 Set up Estuary Flow Capture 6:45 Pinecone Materialization 6:57 Create Pinecone Index 8:57 Python 10:45 Testing #chatgpt #chatgpt4 #chatgptprompt #ai #artificialintelligence #data #dataengineering #dataengineer

Streaming SQL in 5 Minutes
This video walks through a tutorial on how to transform streaming data with SQL using Estuary Flow within minutes. 0:00 Background 0:13 Use Cases 0:41 Tutorial Scenario 0:58 Flow Tutorial Begins 1:25 GitPod 3:13 Schema Inference 3:58 Publish Catalog 5:11 Validation Preview command to run in terminal: flowctl preview --source flow.yaml --interval 200ms | jq -c 'del(._meta)' Check out our in-depth article on Streaming SQL: https://estuary.dev/streaming-sql Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ While several CDC tools allow users to ingest data from a source to a target, few offer SQL transformation capabilities. Often, replicating data as-is from one place to another may not be sufficient for your needs. For example, you may want to do some filtering, apply certain calculations to your source data, or aggregate data from multiple documents before the data arrive at the destination. Other common use cases include merging across several collections using a common key and applying business logic to the source data. Using derivations in Flow, you can perform a variety of transformations, from a simple remapping use case to a highly complex stateful transaction processing. #data #datapipeline #dataengineering

Stream Processing with SQLite - Stateful Transformation Tutorial
Syncing real-time Wikipedia analysis to a spreadsheet with SQL and Estuary Flow 0:00 Intro 0:22 Scenario 2:17 Tutorial Begins 2:43 GitPod 3:15 Set up flow.yaml 5:24 Migration 6:27 Lambda 10:26 Preview 12:29 Schema Inference 13:22 Publish 13:52 Create Materialization 14:45 Validation Check out our in-depth article on Streaming SQL: https://estuary.dev/streaming-sql Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ At times, the collections generated by a capture may not be suitable for your needs. For instance, you might want to filter certain documents or add calculations to them. Perhaps you need to unpack an array nested inside or aggregate data from many documents. Alternatively, you might need to merge across several collections using a common key, or employ business logic to arrive at a real-time decision. With Flow derivations, you can perform a wide range of transformations, from a simple remapping to complicated, self-referential, and stateful transaction processing. #data #datapipeline #dataengineering

Salesforce Change Data Capture Complete Tutorial: 2 Easy Methods
This video teaches 2 easy methods to implement Salesforce Change Data Capture. Commands run in the video can be found in this corresponding step-by-step guide: https://www.estuary.dev/salesforce-change-data-capture-tutorial/ Chapters: 0:00 Salesforce CDC Introduction 1:18 Use Case Example 3:05 Method 1: Use Estuary Flow 6:00 Method 2: Use EMP Connector 10:58 Limitations Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ #data #salesforce #dataengineering #datapipeline

PostgreSQL Data Capture Step-by-Step Tutorial
This is a tutorial on how to set up a PostgreSQL data capture using Estuary Flow. https://docs.estuary.dev/reference/Connectors/capture-connectors/PostgreSQL 0:00 Intro 0:52 PostgreSQL Database Set-up 2:45 Estuary Flow Set-up 3:00 Endpoint Config 3:55 Troubleshooting Tip 1: No collection 5:14 Troubleshooting Tip 2: Connection string issue Try Estuary Free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ PostgreSQL is an object-relational database management system (ORDBMS) based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department. POSTGRES pioneered many concepts that only became available in some commercial database systems much later. PostgreSQL is an open-source descendant of this original Berkeley code. It supports a large part of the SQL standard and offers many modern features: complex queries foreign keys triggers updatable views transactional integrity multi-version concurrency control Also, PostgreSQL can be extended by the user in many ways, for example by adding new data types functions operators aggregate functions index methods procedural languages And because of the liberal license, PostgreSQL can be used, modified, and distributed by anyone free of charge for any purpose, be it private, commercial, or academic. #data #postgres #postgresql #datapipeline

PostgreSQL Data Capture - Estuary Demo
How to capture every change event from your source and see in your target what the change events are: insert vs update vs delete. In this demo, we use a PostgreSQL database as our source, and Google Sheet as our target. Try Estuary Free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ PostgreSQL is an object-relational database management system (ORDBMS) based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department. POSTGRES pioneered many concepts that only became available in some commercial database systems much later. PostgreSQL is an open-source descendant of this original Berkeley code. It supports a large part of the SQL standard and offers many modern features: complex queries foreign keys triggers updatable views transactional integrity multi-version concurrency control Also, PostgreSQL can be extended by the user in many ways, for example by adding new data types functions operators aggregate functions index methods procedural languages And because of the liberal license, PostgreSQL can be used, modified, and distributed by anyone free of charge for any purpose, be it private, commercial, or academic. #data #postgres #postgresql #datapipeline

What are Schema Inference, Write and Read Schemas?
Lesser known facts about schemas: What is schema inference? What are write and read schemas? When to use both? Try Estuary for free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ Flow documents and collections always have an associated schema that defines the structure, representation, and constraints of your documents. Collections must have one schema, but may have two distinct schemas: one for when documents are added to the collection, and one for when documents are read from that collection. Schemas are a powerful tool for data quality. Flow verifies every document against its schema whenever it's read or written, which provides a strong guarantee that your collections hold only "clean" data, and that bugs and invalid documents are caught before they can impact downstream data products. In most cases, Flow generates a functioning schema on your behalf during the discovery phase of capture. In advanced use cases, however, customizing your schema becomes more important. Flow performs static inference of the collection schema to verify the existence and types of all keyed document locations, and will report an error if the location could not exist, or could exist with the wrong type. #schema #estuaryflow #data #dataops #dataengineering #datapipeline

Wikipedia Live Demo on Estuary Flow
We have just released a live demo to the Estuary Flow UI, demonstrating how Flow captures change events from Wikipedia API, transforms the data, and materializes the derived collection to a Google Sheet all in real time. Try us free: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ #wikipedia #estuaryflow #data #dataops #dataengineering #datapipeline

How to Set Up a MongoDB Data Capture on Estuary
This video explains: - Why is MongoDB popular? - When would you want to stream data from MongoDB? - The steps to set up a Data Capture using Estuary's MongoDB Connector 0:00 Mongo Intro 1:44 Tutorial starts Estuary is in public beta now! Get free access to experiment with data streaming in real time: https://estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public License which is deemed non-free by several distributions. #estuaryflow #mongodb #data #dataops #dataengineering

Streaming vs Batch Processing in 5 minutes
This video explains: - What is streaming / stream processing? - What is batch processing? - Why and when to use streaming over batch processing? - A real-world example - 6 real-world use cases Estuary Flow is in public beta now! Get free access to experiment with data streaming in real time: https://www.estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ssb/redirect Check out our blog on this topic: https://www.estuary.dev/real-time-and-batch-data-processing-an-introduction/ We're collecting product feedback and are actively developing connectors that our users request. So come try us out! If you need a connector we don't have, reach out to us. We would love to collaborate with you and support your data projects as we continue to refine our product. #data #dataops #dataengineering #estuaryflow

Estuary Overview
Discover the power of Estuary, a platform built to make creating real-time data pipelines easy. In this overview, we’ll show you how Estuary helps you move data from source to destination in real time, with no coding required. 🌐 Check out our website to learn more about Estuary: https://www.estuary.dev/ ➡️ Start building your pipelines for free now: https://dashboard.estuary.dev/register if you’re curious for more, check out our docs or jump into our community Slack to ask questions! 📚 Explore our docs for detailed guides and tutorials: https://docs.estuary.dev/ 💬 Join our Slack community to connect with developers and ask questions: https://estuary-dev.slack.com/ #Estuary #RealtimeETL #DataStreaming #DataOps #dataengineering

Tutorial: How to sync Google Sheets data using Estuary
Estuary is in public beta now! Get free access to experiment with data streaming in real time: https://estuary.dev/ Join our Slack channel with a community of developers: https://estuary-dev.slack.com/ #estuaryflow #googlesheet #data #dataops #dataengineering

Right-time Data Integration for Snowflake
Hosted by: Dani & Ben Rogojan (Seattle Data Guy) Choosing the right ingestion strategy for Snowflake can dramatically influence your latency, cost, and operational overhead. In this live 1-hour session, Dani and Ben Rogojan will break down the modern ingestion landscape and help you understand when batch, micro-batch, serverless, or real-time streaming pipelines make the most sense. This webinar is designed for data engineers, architects, and Snowflake users who want a clearer framework for making ingestion decisions, without the guesswork.

Seamless Data Integration, Unlimited Potential
Discover the simplest way to connect and move your data.
Get hands-on for free, or schedule a demo to see the possibilities for your team.


