Estuary

Pipe Syntax for SQL: Databricks vs Snowflake vs BigQuery

Explore the differences between Databricks pipe syntax (|>), Snowflake flow operator (->>), and BigQuery’s pipe syntax. This technical deep dive compares syntax, use cases, and design philosophies.

Blog post hero image
Share this article

In addition to BigQuery, both Databricks and Snowflake have introduced pipe-style SQL syntax that aims to improve query readability and development experience. While they share a visual resemblance (using pipe-like chaining), their purpose, scope, and design philosophy are quite different.

Introduction to Pipe and Flow Syntax

SQL has historically followed a rigid structure that often conflicts with the logical way humans think through queries. This can make SQL verbose and difficult to read, especially in complex analytics scenarios involving multiple subqueries or transformations. To improve readability and usability, Databricks, Snowflake, and Google BigQuery have introduced alternative syntactic models that reimagine how SQL queries can be written.

Databricks' pipe syntax (|>) allows SQL queries to be written in a linear, transformation-first style, similar to the way DataFrame APIs work in PySpark or Pandas. Snowflake's flow operator (->>) enables chaining of full SQL statements, offering a scripting-like model that maintains SQL's declarative nature while supporting procedural execution. Google BigQuery's pipe syntax (|>) closely resembles Databricks' approach, supporting linear, composable queries with additional operators like EXTENDSET, and DROP.

Design Philosophies and Mental Models

The key differences among the three syntaxes reflect their underlying design philosophies:

  • Databricks |>: Focused on simplifying transformations within a single query block. Inspired by DataFrame-style operations.
  • Snowflake ->>: Targets SQL workflow scripting with full control over execution order and intermediate result usage.
  • BigQuery |>: Expands on Databricks' model with broader operator support and stronger aliasing capabilities.

Each syntax represents a different mental model for SQL development:

  • Databricks and BigQuery support a declarative, transformation-first approach.
  • Snowflake supports a procedural, statement-by-statement model.

Databricks Pipe Syntax (|>) – Key Concepts

  • Inspired by: DataFrame APIs (e.g., PySpark, Pandas)
  • Primary Benefit: Readability and clarity by mimicking logical transformation steps
  • Structure Example:
plaintext
FROM lineitem |> WHERE price < 100 |> SELECT price * quantity AS total |> AGGREGATE SUM(total) AS grand_total GROUP BY country

Operator Highlights:

  • AGGREGATE: Combines aggregation and GROUP BY, reducing duplication
  • EXTEND: Appends computed columns
  • SETDROPRENAME: Modify intermediate tables inline
  • Can be composed incrementally; perfect for exploratory or iterative analytics

Adoption: Available in Apache Spark 4.0+ and Databricks Runtime 16.2+. Also used in GoogleSQL (BigQuery) and ZetaSQL.

Snowflake Flow Operator (->>) – Key Concepts

  • Inspired by: Unix-style piping and scripting logic
  • Primary Benefit: Simplified execution of multi-statement scripts
  • Structure Example:
plaintext
SHOW WAREHOUSES ->> SELECT "name", "state", "size" FROM $1 ->> SELECT * FROM $2 WHERE state = 'RUNNING';

Capabilities:

  • Supports all SQL statements including DDL, DML, SHOW, etc.
  • Uses $1$2, etc., to reference previous step outputs (only in FROM clauses)
  • Ideal for scripting-style logic with sequential dependencies

Limitations:

  • No reuse outside the chain
  • No positional bind variable support
  • Only available within Snowflake Scripting blocks or interactive sessions

Example Use Case Comparison

Goal: Find sales totals per item category, then filter by total sales.

Databricks |>

plaintext
FROM sales_data |> AGGREGATE SUM(amount) AS total_sales GROUP BY category |> WHERE total_sales > 1000 |> ORDER BY total_sales DESC;

Snowflake ->>

plaintext
SELECT category, SUM(amount) AS total_sales FROM sales_data GROUP BY category ->> SELECT * FROM $1 WHERE total_sales > 1000 ->> SELECT * FROM $2 ORDER BY total_sales DESC;

BigQuery |>

plaintext
-- Compute total sales by item. FROM mydataset.Produce |> WHERE sales > 0 |> AGGREGATE SUM(sales) AS total_sales, COUNT(*) AS num_sales   GROUP BY item;

Community Commentary & Critiques

Franck Pachot and other SQL experts have offered nuanced critiques:

Databricks-style Pipe Criticisms:

  • Misleads developers into thinking SQL is a streaming language (like Unix pipes)
  • Breaks traditional declarative abstraction of SQL
  • Encourages bad habits like omitting table aliases
  • May confuse optimizer execution vs. query author intent

Snowflake-style Flow Criticisms:

  • Less intuitive for simple, one-off queries
  • Verbose referencing via $n can get messy in long chains
  • Doesn’t solve SQL verbosity; more about structuring large scripts

Databricks’ pipe syntax enhances in-query readability; Snowflake’s flow operator supports cross-statement orchestration. They address different problems and are not functionally interchangeable.

GoogleSQL and BigQuery Context

  • GoogleSQL pipe syntax (|>) closely mirrors Databricks’ structure
  • Used in BigQuery, Spanner, Procella, and ZetaSQL
  • Adds additional operators like EXTENDSETDROPRENAME
  • Prioritizes developer ergonomics and auto-completion friendliness
  • Considered part of a broader shift toward linear, expressive SQL construction

Example:

plaintext
-- Compute total sales by item. FROM mydataset.Produce |> WHERE sales > 0 |> AGGREGATE SUM(sales) AS total_sales, COUNT(*) AS num_sales   GROUP BY item;

When to Use Which

Use Databricks Pipe Syntax if:

  • You are building complex SELECT queries with nested logic.
  • You prefer a linear, DataFrame-like mental model.
  • You need improved readability within query blocks.

Use Snowflake Flow Operator if:

  • You need to orchestrate a series of dependent SQL statements.
  • Your use case involves DDL, DML, or administrative commands.
  • You want to script a multi-step workflow within SQL.

Use BigQuery Pipe Syntax if:

  • You are writing ad-hoc or exploratory analytics queries.
  • You value stepwise readability and rich transformation operators.
  • You work within the GoogleSQL ecosystem and want consistent patterns.

Final Thoughts

The introduction of pipe and flow syntax across Databricks, Snowflake, and BigQuery reflects a broader shift in SQL usability and expressiveness. Each syntax extension targets a different set of developer needs: Databricks and BigQuery improve how we write and maintain single queries, while Snowflake enhances how we orchestrate SQL workflows.

Understanding the distinct capabilities of each syntax will help teams adopt the right approach for their use cases. Whether you're optimizing for readability, reusability, or procedural clarity, modern SQL has a syntax to match.

FAQs

    Yes. All three platforms—Databricks, Snowflake, and BigQuery—allow mixing their respective pipe or flow syntax with standard SQL. This makes it easier to gradually adopt the new style or enhance existing queries incrementally.
    No. The underlying execution plans are the same as long as the logic is equivalent. These syntaxes are designed for better readability and developer experience, not performance optimization.
    Use Snowflake’s ->> flow operator if you're chaining multiple SQL statements across different types (e.g., DML, DDL, SHOW). For complex transformations within a single query, Databricks |> or BigQuery |> provide a more readable and modular approach, especially for analysts and data scientists.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Dani Pálma
Dani PálmaHead of Data Engineering Marketing

Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.

Related Articles

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.