Motherduckanalytics engineering

15 min read

Last updated: January 31, 2026

Building an AI-Ready Air Quality Analytics Pipeline with Estuary and MotherDuck

Learn how to build a real-time air quality pipeline with Estuary and MotherDuck. Stream OpenWeather API data, run fast analytics with DuckDB, and generate AI summaries—no heavy infrastructure needed.

Ruhee Shrestha Technical Writer

A blog post image that displays the title "Building an AI-Ready Air Quality Analytics Pipeline with Estuary and MotherDuck" and both company logos shown beneath.

Share this article

Air pollution is a growing public health and environmental concern for many urban regions in Nepal. Rapid urbanization, increased vehicular traffic, industrial activity, and seasonal factors such as temperature inversions, pressure, and wind speeds contribute to fluctuating air quality levels across cities. This article analyzes Air Quality Index (AQI) patterns across major cities in Nepal and examines how specific pollutants, along with weather conditions like temperature and wind speed, influence overall air quality.

Air quality data is inherently time-sensitive, and pollution levels can change rapidly due to weather, traffic, industrial activity, or seasonal effects. Meaningful analysis often depends on understanding what's happening right now, as well as how current conditions compare to recent history. This makes air quality data a strong fit for pipelines, where data freshness is configurable rather than fixed to either real-time or batch extremes.

In this article, we'll walk through a practical, end-to-end workflow for building an AI-ready air quality analytics pipeline using Estuary and MotherDuck. Estuary is used to ingest hourly air pollution data across major cities in Nepal from the OpenWeather API via HTTP Webhook capture and stream it into data warehouse/analytics storage based in MotherDuck, powered by DuckDB. Prompt-based AI workflows within MotherDuck automatically transform aggregated air quality metrics into human-readable explanations on weekly air qualities across cities, delivering conversational queries suitable for reporting to RAG applications.

Key Takeaways

Right-time pipelines ensure air quality data remains fresh and actionable.
MotherDuck + DuckDB provides fast, low-overhead analytics without heavy infrastructure.
Prompt-based SQL enables explainable AI summaries grounded directly in data.
The architecture scales naturally to additional cities, pollutants, and AI agents.

On Estuary

Estuary is a right-time data platform designed to unify cost, latency, and deployment flexibility within one dependable managed system. Instead of stitching together separate tools for change data capture, batch ingestion, streaming pipelines, and application synchronization, Estuary provides a single platform that supports analytics, operational workloads, and AI use cases across a wide range of latencies and environments.

When working with air quality data at scale, the objective is not just to collect measurements, but to:

Analyze trends as they evolve
Explore anomalies quickly
Operationalize insights for dashboards, alerts, and AI-driven explanations

To support these workflows, data must be clean, structured, and continuously usable. Raw API responses alone are not sufficient; they need to be normalized, stored reliably, and made available to downstream systems with predictable freshness. Estuary ensures seamless integration of air quality API data into MotherDuck through continuous ingestion and real time synchronization of the latest measurements.

Users can set up seamless data pipelines from multiple sources to MotherDuck, ensuring efficient and timely data availability for analysis. This integration leverages Estuary's robust data movement capabilities to enhance the performance and utility of MotherDuck’s DuckDB-based architecture.

On MotherDuck

MotherDuck is a managed, serverless cloud service built on top of DuckDB that brings the same fast, lightweight query engine into a shared cloud environment. It preserves DuckDB’s simplicity and performance while adding the features needed for team-based production analytics:

Serverless architecture: No servers to provision or manage. MotherDuck automatically scales to meet analytical workloads.
Cost-effective: Well-suited for analytical workloads without the cost profile of large data warehouse platforms.
Collaboration: Teams can share queries, datasets, and results in real time, working from a single source of truth.
Secure secret storage: Credentials and external connections are managed safely in the cloud.
Pipeline integration: Platforms such as Estuary can write directly into MotherDuck, ensuring analytics tables remain continuously up to date.

Why We Chose MotherDuck

While building air quality analytics and AI-driven explanations, we can consider several popular data platforms like Snowflake, Databricks, BigQuery, and Postgres. Each of these systems is powerful, but many proved to be either too expensive, too complex, or too operationally heavy for a workload centered on fast analytics, iterative exploration, and downstream AI use cases.

MotherDuck allows us to process, aggregate, and unify the incoming air quality data into a fast, reliable analytical dataset. Because MotherDuck is built on DuckDB, it's fast enough for both exploration and reporting. With MotherDuck handling analytics, we can layer on LLM-based agents that explain air quality trends using up-to-date data.

Overview of Our Pipeline

**Figure**: Right-time air quality and weather data pipeline using Python for API polling, Estuary for managed ingestion and synchronization, MotherDuck for analytics, and an AI agent for interpretation and discovery.

Air Quality Right-Time Data Pipeline.png — **Figure**: Right-time air quality and weather data pipeline using Python for API polling, Estuary for managed ingestion and synchronization, MotherDuck for analytics, and an AI agent for interpretation and discovery.

Datasets Used:

Air Quality Data (OpenWeather Air Pollution API): The primary dataset used in this workflow comes from the OpenWeather Air Pollution API, which provides hourly city-level measurements for major air pollutants. This API aggregates data from multiple monitoring stations and models, making it suitable for regional and cross-city analysis. This project focuses on cities across Nepal.
Current Weather Data (OpenWeather Weather API): To enrich the air quality measurements with environmental context, we also use weather data from the OpenWeather Weather API. This secondary dataset allows us to correlate air quality patterns with meteorological factors, such as low wind speeds or lack of rainfall, that can influence pollution levels.

Prerequisites:

Estuary: Sign up for a free trial of Estuary, which will be used to ingest API data via HTTP/Webhooks and stream it into MotherDuck.
OpenWeather API Key: Create an account with OpenWeather to access both the Air Pollution API and Weather API.
Any IDE capable of running Python scripts will work for calling OpenWeather APIs and sending data to Estuary’s HTTP Webhook endpoint.
MotherDuck Account: Sign up for a 21-day free trial of MotherDuck, which will serve as the analytical warehouse and AI-ready storage layer. Retrieve your MotherDuck access token / API key from the account settings after sign-up.
AWS S3 Account (Staging): An Amazon S3 bucket is used for staging or intermediate storage when loading data into MotherDuck.

Let's begin!

Step-by-Step Guide: Streaming API Data into MotherDuck Using Estuary

Step 1: Configure Estuary’s HTTP Webhooks Capture

Click on New Capture and search for HTTP Webhook.

Name your Capture: For example, air-quality-data. Then, create your own Authentication token.

Add the webhook id name: air-quality-data, and test and Save to Collection.
Once your changes are saved, click on your capture and you can now see the following Public Endpoint:

By clicking the link, you can test it:

You can directly use the following curl command to add API values into your POST link:

plaintextcurl -X 'POST' \
'https://32bf5efc57d3456b-8080.reactor.aws-us-east-1-c1.dp.estuary-data.com/air-quality-data' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "hello": "world!"
}'

Save the above Webhooks URL and Authentication Token safely, to be used later. Similarly, create another Estuary Webhooks source for the weather.

Step 2: Build a Python Ingestion Script to Send Air Quality and Weather API Date to Estuary via HTTP/Webhooks

Air Quality API and Weather API Data: After signing up for a free account for OpenWeather API, retrieve the API Key first here and save it in .env file or somewhere secure.

We'll be looking at 5 cities in Nepal, so I'll be using the following APIs:

plaintextAQ_URL=https://api.openweathermap.org/data/2.5/air_pollution/history
WEATHER_API_URL=https://api.openweathermap.org/data/2.5/weather

One generates historical to current Air Quality data - we'll be generating weekly data for 5 cities, on an hourly basis.

We'll be using the current weather data - which doesn't store historical weather data - so we'll be polling it for a few hours with a Python script that's run on an hourly basis to generate historical data.

I. Fetch Air Quality Data:

plaintextimport os
import requests
import pandas as pd
from datetime import datetime, timedelta, timezone
import time
import json
from dotenv import load_dotenv
load_dotenv()
import os

OPENWEATHER_API_KEY = os.getenv("OPENWEATHER_API_KEY")
AQ_URL = os.getenv("AQ_URL")
ESTUARY_WEBHOOK_AQ_URL = os.getenv("ESTUARY_WEBHOOK_AQ_URL")
ESTUARY_AQ_API_KEY = os.getenv("ESTUARY_AQ_API_KEY")

NEPAL_CITIES = [
  {"city": "Kathmandu", "lat": 27.7172, "lon": 85.3240},
  {"city": "Lalitpur", "lat": 27.6644, "lon": 85.3188},
  {"city": "Bhaktapur", "lat": 27.6710, "lon": 85.4298},
  {"city": "Pokhara", "lat": 28.2096, "lon": 83.9856},
  {"city": "Biratnagar", "lat": 26.4525, "lon": 87.2718}
]

def to_unix(ts):
  return int(ts.timestamp())

def fetch_city_hourly_data(city):
  end_time = datetime.now(timezone.utc)
  start_time = end_time - timedelta(hours=1)

  params = {
      "lat": city["lat"],
      "lon": city["lon"],
      "start": to_unix(start_time),
      "end": to_unix(end_time),
      "appid": OPENWEATHER_API_KEY
  }

  response = requests.get(AQ_URL, params=params)
  response.raise_for_status()

  rows = []
  for r in response.json().get("list", []):
      rows.append({
          "city": city["city"],
          "country": "NP",
          "latitude": city["lat"],
          "longitude": city["lon"],
          "timestamp_utc": datetime.fromtimestamp(r["dt"], tz=timezone.utc),
          "aqi": r["main"]["aqi"],
          "pm25": r["components"]["pm2_5"],
          "pm10": r["components"]["pm10"],
          "no2": r["components"]["no2"],
          "o3": r["components"]["o3"],
          "so2": r["components"]["so2"],
          "co": r["components"]["co"]
      })

  return pd.DataFrame(rows)

def fetch_all_nepal_data():
  dfs = []
  for city in NEPAL_CITIES:
      print(f"Fetching AQI for {city['city']}")
      dfs.append(fetch_city_hourly_data(city))
      time.sleep(1)
  return pd.concat(dfs, ignore_index=True)

def send_all_cities_hourly_to_estuary(df):
  """
  df: pandas DataFrame with weekly hourly AQI data for all Nepal cities
  """

  df = df.sort_values(["city", "timestamp_utc"])

  print(f"Sending {len(df)} hourly AQI records to Estuary")

  for _, row in df.iterrows():
      payload = {
          "source": "openweather",
          "country": "NP",
          "city": row["city"],
          "timestamp_utc": row["timestamp_utc"].isoformat(),
          "aqi": int(row["aqi"]),
          "pm25": float(row["pm25"]),
          "pm10": float(row["pm10"]),
          "no2": float(row["no2"]),
          "o3": float(row["o3"]),
          "so2": float(row["so2"]),
          "co": float(row["co"]),
          "latitude": float(row["latitude"]),
          "longitude": float(row["longitude"])
      }

      resp = requests.post(
          ESTUARY_WEBHOOK_AQ_URL,
          headers={
              "Content-Type": "application/json",
              "Authorization": f"Bearer {ESTUARY_AQ_API_KEY}"
          },
          data=json.dumps(payload),
          timeout=10
      )

      if resp.status_code >= 300:
          print("❌ Error:", resp.status_code, resp.text)
      else:
          print(f"✓ Sent {payload['city']} @ {payload['timestamp_utc']}")

      # be polite to Estuary + network
      time.sleep(0.15)


if __name__ == "__main__":
  df = fetch_all_nepal_data()
  if not df.empty:
      send_all_cities_hourly_to_estuary(df)

II. Fetch Weather Quality Data:

fetch_nepal_weather.py

plaintextimport requests
import time
import json
from datetime import datetime, timezone
from dotenv import load_dotenv
load_dotenv()
import os

OPENWEATHER_API_KEY = os.getenv("OPENWEATHER_API_KEY")
WEATHER_API_URL = os.getenv("WEATHER_API_URL")
print(WEATHER_API_URL)

NEPAL_CITIES = [
  {"city": "Kathmandu", "lat": 27.7172, "lon": 85.3240},
  {"city": "Lalitpur", "lat": 27.6644, "lon": 85.3188},
  {"city": "Bhaktapur", "lat": 27.6710, "lon": 85.4298},
  {"city": "Pokhara", "lat": 28.2096, "lon": 83.9856},
  {"city": "Biratnagar", "lat": 26.4525, "lon": 87.2718},
  {"city": "Birgunj", "lat": 27.0104, "lon": 84.8770}
]

def fetch_current_weather(city):
  params = {
      "lat": city["lat"],
      "lon": city["lon"],
      "appid": OPENWEATHER_API_KEY,
      "units": "metric"
  }

  r = requests.get(WEATHER_API_URL, params=params)
  r.raise_for_status()
  data = r.json()

  return {
      "city": city["city"],
      "country": data["sys"]["country"],
      "latitude": city["lat"],
      "longitude": city["lon"],
      "timestamp_utc": datetime.fromtimestamp(data["dt"], tz=timezone.utc).isoformat(),
      "temp_c": data["main"]["temp"],
      "feels_like_c": data["main"]["feels_like"],
      "humidity": data["main"]["humidity"],
      "pressure": data["main"]["pressure"],
      "wind_speed": data["wind"].get("speed"),
      "wind_deg": data["wind"].get("deg"),
      "clouds": data["clouds"]["all"],
      "weather_main": data["weather"][0]["main"],
      "weather_desc": data["weather"][0]["description"]
  }

def run_and_push():
  payload = []
  ESTUARY_WEBHOOK_WEATHER_URL = os.getenv("ESTUARY_WEBHOOK_WEATHER_URL")
  WEATHER_API_KEY = os.getenv("WEATHER_API_KEY")

  for city in NEPAL_CITIES:
      payload.append(fetch_current_weather(city))
      time.sleep(1)  # OpenWeather rate safety

  # Push to Estuary webhook
  resp = requests.post(
          ESTUARY_WEBHOOK_WEATHER_URL,
          headers={
              "Content-Type": "application/json",
              "Authorization": f"Bearer {WEATHER_API_KEY}"
          },
          data=json.dumps(payload),
          timeout=10
  )
  resp.raise_for_status()

  print(f"Pushed {len(payload)} records to Estuary")

if __name__ == "__main__":
  run_and_push()

Run Cron job to run both scripts hourly:

plaintextcrontab -e
i

0 * * * * cd /your-file-path && /usr/bin/python3 fetch_nepal_weather.py >> weather.log 2>&1
0 * * * * cd /your-file-path && /usr/bin/python3 fetch_nepal_aqi.py >> aqi.log 2>&1

plaintext
#To exit:

Esc

:wq

.env:

plaintextOPENWEATHER_API_KEY=YOUR_OPENWEATHER_API_KEY
ESTUARY_WEBHOOK_AQ_URL=https://xxxxxxx.reactor.aws-us-east-1-c1.dp.estuary-data.com/air-quality-data
ESTUARY_AQ_TOKEN=xxxx
AQ_URL=https://api.openweathermap.org/data/2.5/air_pollution/history
ESTUARY_WEBHOOK_WEATHER_URL=https://xxxxxxxreactor.aws-us-east-1-c1.dp.estuary-data.com/nepal_weather_daily
WEATHER_API_URL=https://api.openweathermap.org/data/2.5/weather
ESTUARY_WEATHER_TOKEN=xxxxx

Output:

Hourly captured data for:

Weather:

Air Quality:

3: Setting Up Estuary with MotherDuck

I.Configure AWS S3 Staging (Secure Intermediate Storage)

Before data can be written into MotherDuck, Estuary requires a temporary staging location. We use Amazon S3 for this purpose.

In your AWS account, create an S3 Bucket for Staging (for example: motherduck-staging), and optionally create a folder inside the bucket. Copy the S3 URL, as it will be required during MotherDuck materialization setup.
Navigate to IAM → Users.
Create or use an IAM user that has read and write access to the S3 bucket.
Attach a policy with at least the following actions:
1. s3:PutObject
2. s3:GetObject
3. s3:ListBucket
Select your IAM user, under Security credentials, and click Create access key.
Save both the access key and secret key securely. These credentials allow Estuary to write staged data into S3.

II. Create MotherDuck Account and Retrieve MotherDuck API Keys

Sign up for a MotherDuck account, and go to Integrations → Access Tokens → Click on Create Token and save it securely.

Step 4: Create a MotherDuck Materialization in Estuary

In Estuary, go to Destinations → click on New Materialization → Type MotherDuck.

Fill in the following fields:

MotherDuck Service Token: Paste the service token retrieved earlier from MotherDuck.
Database: Create a database in MotherDuck and add it here (For example: air_quality_db).
Database Schema - Add a database schema name.
For Staging Bucket Configuration, choose S3 and add the following:
1. S3 Staging Bucket name: Add your S3 bucket name from earlier (For example: motherduck-staging).
2. Access Key ID and Secret Access Key: Paste it from earlier.
3. AWS S3 Bucket Region: Add the region of your S3 bucket.

Link your Capture - Add the two Webhooks captures:

Output:

Step 5: Performing Analytical Queries in MotherDuck

Querying Air Quality Data:

plaintext
select * from "air_quality_db"."air-quality-data"."air-quality-data" where city = 'Kathmandu';

The table shown above displays hourly air quality records for Kathmandu. Each row represents:

One city
One timestamp
Pollutant measurements (PM2.5, PM10, NO₂, O₃, etc.)
Calculated AQI

Querying Weather Data:

plaintext
select * from weather_db.weather_data.nepal_weather_daily ;

These are weather measurements such as temperature, humidity, wind speed, and cloud cover captured hourly. Because the OpenWeather weather endpoint does not provide historical data, these values are accumulated over time via hourly polling.

Step 6: Enriching Data - Dominant Pollutant & Health Classification

WHO and EPA Health Classification

The World Health Organization’s Air quality guidelines (AQG) are a global target for national, regional and city governments to work towards improving their citizen’s health by reducing air pollution. WHO offers global baseline recommendations, making them ideal for cross-country or multi-city analysis (e.g., comparing Nepal, India, and the U.S.).

The Environmental Protection Agency (EPA) is a federal US agency responsible for protecting human health and the environment through enforceable regulations. For air quality, the EPA defines the Air Quality Index (AQI) to translate pollutant concentrations into categories such as Good, Moderate, Unhealthy, and Hazardous. EPA classifications are widely used in operational dashboards and public alerts, making them practical for real-time decision-making with specific consideration for sensitive groups (children, elderly, people with asthma).

Using WHO and EPA health classifications to enrich air-quality data adds public-health context to raw pollutant measurements, transforming numbers into actionable insight. On their own, AQI values or pollutant concentrations are hard to interpret.

Health categories give LLMs and rule-based systems a semantic layer: “PM2.5 exceeded WHO limits but remains within EPA ‘Moderate’ range.” This could be useful for various use cases such as RAG pipelines, automated summaries, and alert agents.

Here's what the following query shows:

A CASE expression that identifies the pollutant with the highest average concentration per day.
Health categories derived using official thresholds:
- EPA (U.S.) 24-hour standards
- WHO 2021 daily guidelines
Each day is labeled as:
- Good / Moderate / Unhealthy (EPA)
- Compliant / Above WHO Guidelines (WHO)

This converts numeric sensor data into human-interpretable health signals.

plaintextCREATE OR REPLACE TABLE air_quality_db.city_daily_air_quality_enriched AS
WITH daily_aggregates AS (
    SELECT
        aq.city,
        DATE(aq.timestamp_utc AT TIME ZONE 'Asia/Kathmandu') AS day,
        -- Daily air quality averages
        AVG(aq.aqi)  AS avg_aqi,
        MAX(aq.aqi)  AS max_aqi,
        AVG(aq.pm25) AS avg_pm25,
        AVG(aq.pm10) AS avg_pm10,
        AVG(aq.no2)  AS avg_no2,
        AVG(aq.o3)   AS avg_o3

    FROM "air_quality_db"."air-quality-data"."air-quality-data" aq
    GROUP BY
        aq.city,
        DATE(aq.timestamp_utc AT TIME ZONE 'Asia/Kathmandu')
),

classified AS (
    SELECT
        city,
        day,
        avg_aqi,
        max_aqi,
        avg_pm25,
        avg_pm10,
        avg_no2,
        avg_o3,

        /* ---------------------------------------------------
          Dominant pollutant
          --------------------------------------------------- */
        CASE
            WHEN avg_pm25 IS NULL
            AND avg_pm10 IS NULL
            AND avg_no2  IS NULL
            AND avg_o3   IS NULL
                THEN NULL

            WHEN avg_pm25 >= COALESCE(avg_pm10, -1)
            AND avg_pm25 >= COALESCE(avg_no2,  -1)
            AND avg_pm25 >= COALESCE(avg_o3,   -1)
                THEN 'PM2.5'

            WHEN avg_pm10 >= COALESCE(avg_pm25, -1)
            AND avg_pm10 >= COALESCE(avg_no2,  -1)
            AND avg_pm10 >= COALESCE(avg_o3,   -1)
                THEN 'PM10'

            WHEN avg_no2 >= COALESCE(avg_pm25, -1)
            AND avg_no2 >= COALESCE(avg_pm10, -1)
            AND avg_no2 >= COALESCE(avg_o3,   -1)
                THEN 'NO2'

            ELSE 'O3'
        END AS dominant_pollutant,

        /* ---------------------------------------------------
          EPA health category (24-hour standards)
          --------------------------------------------------- */
        CASE
            WHEN avg_pm25 <= 12
            AND avg_pm10 <= 54
            AND avg_no2  <= 53
            AND avg_o3   <= 54
                THEN 'Good'

            WHEN avg_pm25 <= 35.4
            AND avg_pm10 <= 154
            AND avg_no2  <= 100
            AND avg_o3   <= 70
                THEN 'Moderate'

            ELSE 'Unhealthy or Worse'
        END AS epa_health_category,

        /* ---------------------------------------------------
          WHO 2021 guideline compliance (daily)
          --------------------------------------------------- */
        CASE
            WHEN avg_pm25 <= 15
            AND avg_pm10 <= 45
            AND avg_no2  <= 25
            AND avg_o3   <= 50
                THEN 'Compliant'
            ELSE 'Above WHO Guidelines'
        END AS who_health_status

    FROM daily_aggregates
)
SELECT * FROM classified;

Step 7: Generating AI-Ready Summaries Using `prompt()`

One great feature within MotherDuck is the prompt function, which allows you to interact with Large Language Models (LLMs) directly from SQL. You can generate both free-form text and structured data outputs.

Example of an AI-driven insight: “Air quality worsened due to PM2.5 exceeding the World Health Organization’s 24-hour guideline, increasing short-term respiratory risk, particularly for sensitive populations.”

plaintextCREATE OR REPLACE TABLE air_quality_db.city_daily_air_quality_rag AS
SELECT
    city,
    day,
    avg_aqi,
    max_aqi,
    avg_pm25,
    avg_pm10,
    avg_no2,
    avg_o3,

    dominant_pollutant,
    epa_health_category,
    who_health_status,

    prompt(
      'City=' || city || '. '
      || 'Date=' || day || '. '
      || 'Dominant pollutant=' || dominant_pollutant || '. '
      || 'EPA category=' || epa_health_category || '. '
      || 'WHO status=' || who_health_status || '. '
      || 'PM2.5=' || avg_pm25 || ', '
      || 'PM10=' || avg_pm10 || ', '
      || 'NO2=' || avg_no2 || ', '
      || 'O3=' || avg_o3 || '. '
      || 'State what the dominant pollutant is using standard definitions.'
    ) AS rag_summary_text

FROM air_quality_db.city_daily_air_quality_enriched;

select * from air_quality_db.city_daily_air_quality_rag;

MotherDuck’s built-in LLM function converts structured SQL results into natural-language summaries. These summaries are grounded strictly in database values, can be indexed for RAG applications, and are suitable for dashboards, alerts, and reports.

Conclusion

In this tutorial, we built a complete, production-ready pipeline for air quality analytics and AI-driven interpretation using Estuary and MotherDuck. By combining right-time data ingestion with fast analytical querying and embedded LLM capabilities, we moved beyond static dashboards toward explainable, continuously updated insights. This approach is well-suited for public health monitoring, environmental reporting, and AI-powered decision support systems where data freshness and interpretability matter as much as accuracy.

Share this article

Summarize this page with AI

Table of Contents

Start Building For Free

About the author

Ruhee Shrestha Technical Writer

Ruhee has a background in Computer Science and Economics and has worked as a Data Engineer for SaaS providing tech startups, where she has automated ETL processes using cutting-edge technologies and migrated data infrastructures to the cloud with AWS/Azure services. She is currently pursuing a Master’s in Business Analytics with a focus on Operations and AI at Worcester Polytechnic Institute.

Building an AI-Ready Air Quality Analytics Pipeline with Estuary and MotherDuck

Key Takeaways

On Estuary

On MotherDuck

Why We Chose MotherDuck

Overview of Our Pipeline

Step-by-Step Guide: Streaming API Data into MotherDuck Using Estuary

Step 1: Configure Estuary’s HTTP Webhooks Capture

Step 2: Build a Python Ingestion Script to Send Air Quality and Weather API Date to Estuary via HTTP/Webhooks

3: Setting Up Estuary with MotherDuck

Step 4: Create a MotherDuck Materialization in Estuary

Step 5: Performing Analytical Queries in MotherDuck

Step 6: Enriching Data - Dominant Pollutant & Health Classification

Step 7: Generating AI-Ready Summaries Using `prompt()`

Conclusion

Start streaming your data for free

About the author

Related Articles

Popular Articles

ChatGPT for Sales Conversations: Building a Smart Dashboard

Why You Should Reconsider Debezium: Challenges and Alternatives

Don't Use Kafka as a Data Lake. Do This Instead.

Streaming Pipelines.

Simple to Deploy.

Simply Priced.

Building an AI-Ready Air Quality Analytics Pipeline with Estuary and MotherDuck

Key Takeaways

On Estuary

On MotherDuck

Why We Chose MotherDuck

Overview of Our Pipeline

Step-by-Step Guide: Streaming API Data into MotherDuck Using Estuary

Step 1: Configure Estuary’s HTTP Webhooks Capture

Step 2: Build a Python Ingestion Script to Send Air Quality and Weather API Date to Estuary via HTTP/Webhooks

3: Setting Up Estuary with MotherDuck

Step 4: Create a MotherDuck Materialization in Estuary

Step 5: Performing Analytical Queries in MotherDuck

Step 6: Enriching Data - Dominant Pollutant & Health Classification

Step 7: Generating AI-Ready Summaries Using prompt()

Conclusion

Start streaming your data for free

About the author

Related Articles

Postgres to MotherDuck: Stream Real-Time Analytics with CDC

Build a Data Pipeline for AI: Use ChatGPT on Your Own Data

Building Real-Time ELT Pipelines with Estuary and Coalesce

Popular Articles

ChatGPT for Sales Conversations: Building a Smart Dashboard

Why You Should Reconsider Debezium: Challenges and Alternatives

Don't Use Kafka as a Data Lake. Do This Instead.

Streaming Pipelines.

Simple to Deploy.

Simply Priced.

Step 7: Generating AI-Ready Summaries Using `prompt()`