As we navigate our digital environments on a daily basis, data silently influences our decisions and the way we experience things. Just like how we understand different words and gestures in real life, Google BigQuery uses different data types to give meaning to these online experiences. The BigQuery data types might seem small and easy to miss, but they help us understand all the information on the internet, making it more meaningful and useful.

While there is no denying that BigQuery data types play a major role in data integrity and efficiency, the catch is that BigQuery isn't very forgiving when it comes to misaligned data types. Mixing them up can generate unexpected errors, skewed results, and a general feeling of "why isn't this working as it should?"

Now, if this situation sounds like something you are experiencing and desperately want to get out of, you are at the right place. Over the next 10 minutes, we’ll break down the different BigQuery data types to help you understand how to handle your data when working with BigQuery, and to simplify the process of tackling data migration tasks. 

What Is Google BigQuery?

Blog Post Image

Image Source

BigQuery is Google’s fully managed enterprise data warehouse. It is designed to quickly analyze large datasets using features like machine learning, geospatial analysis, and business intelligence

One of its key features is its serverless architecture which doesn’t require managing any infrastructure. Instead, you can directly use SQL to dig deep into your data. It provides rapid querying capabilities and can handle terabytes of data in seconds.

BigQuery has a unique approach – it keeps the compute engine (which does the analysis) separate from storage. This design means you can either store data within BigQuery itself or analyze your data where it currently resides. The platform provides:

  • Streaming to continuously update data
  • Federated queries to pull data from other sources
  • Advanced tools like BigQuery ML for deeper data insights

What Are The 9 Different BigQuery Data Types?

Blog Post Image

Image Source

Google BigQuery has different data types for various datasets and use cases. Understanding these is important if you are looking to maximize the efficiency of BigQuery’s warehousing solutions. Using BigQuery data types not only has certain performance implications, but also supports maintaining data integrity.

With BigQuery supporting everything from basic numerical values to intricate arrays or structs, it gives you greater flexibility in data processing. So, let’s explore the different data types in detail.

Numeric Data Type in BigQuery

Blog Post Image

Image Source

The numeric data type represents any data that can be quantified numerically, including whole numbers, fractions, and numbers with decimal points. In databases like Google BigQuery, these types are important for arithmetic operations, comparisons, and statistical computations. They allow for precise representation and manipulation of quantitative data. There are 3 primary numeric data types in BigQuery:

Integer (INT64)

This data type represents whole numbers without any fractional component. 

Example:

plaintext
SELECT 100 AS IntValue;

In this simplistic representation, we’re identifying a number, 100, as a whole number, emphasizing its nature as an integer.

Float (Float64)

Floats are used for numbers that have both whole and fractional components. They give an approximation rather than an exact value which is something to be cautious of when dealing with precise calculations.

Example:

plaintext
SELECT 3.14159 AS ApproximatePi;

Pi is a number that goes on indefinitely. But for many applications, rounding it to a few decimal places (as an approximation) suffices.

Numeric

This data type is analogous to the ‘Decimal’ type in many database systems. The Numeric type can store valid values with 38 fractional digits of precision and 9 decimal digits of scale, like valid values from 00. This makes it perfect for exact calculations like those required in financial operations.

Example:

plaintext
SELECT NUMERIC '25.6789012345' AS PreciseValue;

Here, the Numeric data type guarantees every decimal is accounted for.

Arithmetical/Mathematical Operations

You can perform basic arithmetic operations on numeric data types. It includes addition, subtraction, multiplication, and division. For example, you might calculate the area of a circle using arithmetic/mathematical operations:

plaintext
WITH CircleData AS (  SELECT 7 AS Radius ) SELECT Radius, 3.14159 * Radius * Radius AS Area FROM CircleData;

Output

Blog Post Image

Handling Division Errors

Dividing by zero or performing operations that yield non-finite values can disrupt your SQL queries. BigQuery provides the IEEE_divide function to handle these scenarios. This function returns ‘NaN’ (Not-a-Number) when a division by zero occurs, so your query runs without hitches.

plaintext
WITH DivisionData AS (  SELECT 100 AS Numerator, 0 AS Denominator ) SELECT Numerator, Denominator, IEEE_Divide(Numerator, Denominator) AS Result FROM DivisionData;

Output

Blog Post Image

Safe Functions

In some cases, you might want your query to return NULL instead of an error. For instance, the logarithm of a negative number is mathematically undefined. Using the ‘SAFE’ prefix with your operations returns a NULL value instead of an error.

plaintext
SELECT SAFE.SQRT(-4) AS SquareRootValue;

Output

Blog Post Image

Comparisons

Numeric data types in BigQuery support standard comparison operators. These comparisons are necessary, especially when filtering data using the WHERE clause. When you understand the sorting order of values in BigQuery, it becomes easier to filter and analyze your data. For instance:

WITH Numbers AS(

  SELECT 5 AS Value

  UNION ALL SELECT -3

  UNION ALL SELECT 8

)

SELECT * FROM Numbers 

WHERE Value > 0

ORDER BY Value ASC;

Output

Blog Post Image

Precise Decimal Calculations With Numeric

When every decimal point matters, BigQuery’s numeric type shines. Especially in financial and accounting sectors where miscalculations, even by the tiniest of margins, can be catastrophic. For example:

plaintext
WITH FinancialData AS (  SELECT NUMERIC '10.25' AS Price,         NUMERIC '0.07' AS TaxRate ) SELECT Price, TaxRate, Price * TaxRate AS TaxAmount FROM FinancialData;

Output

Blog Post Image

Boolean Data Type in BigQuery

The Boolean data type in BigQuery represents variables using 2 distinct keywords: TRUE and FALSE. Unlike some languages where the representation might be 0 and 1, BigQuery uses these explicit keywords which makes it easier to read and understand. Let’s look at the core functions of Boolean.

Logical Operations

  • They use logical operators such as AND, OR, and NOT.
  • These operations help in refining the results based on certain conditions.
  • The primary tool used for filtering with logical operations is the WHERE clause.
  • The outcome when employing a WHERE clause paired with a logical operator is dependent on the WHERE condition being TRUE or FALSE.

Conditional Operations

  • BOOLEAN values play an important role when working with conditional clauses.
  • These clauses make the queries more efficient as they give both an option and an alternative when the condition isn’t met.
  • A typical example of this is the IF clause which operates on the given condition, returning different results based on whether the condition evaluates to TRUE or FALSE.

Exploring Boolean Through A Unique Code Example

Suppose we have a dataset of students and we want to determine if they passed or failed based on their scores. Let’s also introduce a Boolean field has_attendance indicating if the student has met the attendance criteria.

plaintext
WITH student_data AS( SELECT "John" AS Name, 85 AS Score, TRUE AS has_attendance UNION ALL SELECT "Doe", 45, TRUE UNION ALL SELECT "Jane", 60, FALSE) SELECT Name, Score,       IF(has_attendance AND Score >= 50, "Passed", "Failed") AS Result FROM student_data

Output

Blog Post Image

In this example, the IF clause is used to determine if a student passed or failed. A student is considered to have ‘Passed’ only if they meet the attendance criteria (‘has_attendance’ is TRUE) and their score is 50 or above. Otherwise, they are marked as ‘Failed’.

Strings Data Types in BigQuery

Strings in BigQuery represent sequences of Unicode characters with variable lengths. They play an important role in data handling, and with BigQuery, you get a rich set of functions to improve your string manipulation tasks. Remember that strings must always be enclosed within single, double, or triple quotation marks.

Core Functions & Operations On Strings

To efficiently handle strings in BigQuery, different functions are provided. Here’s a detailed overview:

Cast

This command is crucial for data type conversions. For instance, you can convert a string to an INT64 or FLOAT64, and likewise, an integer or float can be converted back to a string. Here’s an example:

plaintext
SELECT CAST (25 AS STRING) AS StringRepresentation, CAST('100' AS INT64) AS IntegerRepresentation

Safe_cast

Sometimes, direct conversion using cast might fail. In such scenarios, safe_cast comes to the rescue by returning NULL so your query still runs without problems. For instance, converting the string ‘hello’ to a boolean data type would fail. But with safe_cast, you’d get a NULL instead.

plaintext
SELECT CAST('false' AS bool) AS RegularCast, SAFE_CAST ('hello' AS bool) AS SafeCastResult

Format

This command lets you shape the appearance of the data being converted. For instance, you can control the precision of a float or pad numbers with zeros. Here’s a query example:

plaintext
SELECT FORMAT ('%4d', 320) AS FormattedA, FORMAT ('%6.2f', 44.4444) AS FormattedB, FORMAT ('%04d', 91) AS FormattedC

Output

Blog Post Image

Concat

Concat is an invaluable tool when you want to merge parts of different strings. Let’s say you have a name and city column and you want to combine them:

plaintext
WITH name_city AS ( SELECT 'Alice' AS name, 'London' as city UNION ALL SELECT 'Bob', 'Paris' UNION ALL SELECT 'Charlie', 'Berlin' ) SELECT name, city, CONCAT( name, ' lives in ', city) AS description FROM name_city

Output

Blog Post Image

Regular Expressions

BigQuery provides robust support for regex operations that increase your string search capabilities. Some of the main functions include:

  • REGEXP_CONTAINS: Detects if a regex pattern or character is present.
  • REGEXP_EXTRACT: Extracts the matched regex pattern.
  • REGEXP_REPLACE: Replaces the matched regex pattern.

For example, to check which strings match a pattern of 3 or more letters:

plaintext
SELECT text, REGEXP_CONTAINS (text, r'[a-zA-Z]{3,}') AS MatchStatus FROM (SELECT * from unnest(['abc', 'xy', 'defgh']) AS text )

Output

Blog Post Image

Bytes Data Types in BigQuery

Bytes, as a data type in BigQuery, serve a unique purpose. Unlike strings, which are one or two-digit characters, they are a sequence of raw data often used to store binary information.

Representation Of Bytes

  • Always prefixed with the letter 'B' or 'b' to indicate the byte nature of the data.
  • Enclosed within single, double, or even triple quotation marks.
  • Should never be swapped with strings despite the familiar look.

Let's say you have a database with a table named ‘music_files’. Each row represents a music file, and there’s a column named ‘AudioSignature’ which contains a byte representation of the audio signature for matching purposes. 

SELECT * FROM music_files 

WHERE AudioSignature = b'9aF3Zx8J12LnP7Qr';

In this query, we’re looking to retrieve music files with a specific audio signature. The byte value b'9aF3Zx8J12LnP7Qr' serves as our filter. It's important to note the prefix ‘b’, indicating that we are dealing with a byte literal and not a string.

Time & Date Data Types in BigQuery

Google Cloud’s BigQuery has a range of data types to cater to different requirements when working with date and time. These data types help in representing different moments or periods which can be crucial for analysis and reporting in BigQuery.

Date Data Type

Represents an absolute point in time, denoting a valid calendar date that is independent of any time zone.

  • Typical format: YYYY : [ M ] M : [ D ] D.
  • Represents a valid calendar date that is independent of any time zone.
  • A single date can depict different periods if logged in various time zones.

For instance, the date ‘2023-08-25’ in BigQuery’s Date data type would signify the same calendar day across the globe even if the exact moment it refers to varies depending on the local timezone.

TimeData Type

Denotes a specific time devoid of any particular date. If you were to specify “14:30:00” in the Timedata type, it would just be a representation of the time 2:30 PM without any reference to which day it belongs.

Datetime Data Type

  • Points to a specific moment
  • It disregards any precision beyond that particular moment.

For instance, if you observe ‘2023-08-25 14:30:00’ in the DateTime data type, it gives a picture of that specific minute on that specific day without any finer granularity or timezone context.

Timestamp Data Type

  • DateTime is enclosed within Timestamp, albeit with a lower precision level.
  • Captures an exact moment with microsecond precision irrespective of geographical location.

To put it simply, while DateTime might tell you it’s 2:30 PM on August 25, 2023, Timestamp would provide added details like ‘2023-08-25 14:30:00.123456’.

Timezone Data Types in BigQuery

Timezones, as another major BigQuery data type, play an important role when it comes to parsing and displaying timestamps. In BigQuery, there are 2 primary formats to represent timezones.

Offset From Coordinated Universal Time (UTC)

  • In this format, there should be no spaces between the timezone and the rest of the timestamp.
  • The UTC offset format represents the difference in hours and minutes from the Coordinated Universal Time.

For instance:

2020-05-15 10:45:00.00-3:00 

Here, -3:00 indicates that the time is 3 hours behind UTC.

Time Zone Name From TZ Database

  • This format uses the time zone names from the TZ database which is a comprehensive collection of time zone information.
  • There must be a space between the timezone name and the rest of the timestamp.

For example:

2020-05-15 10:45:00.00 America/New_York 

Here, America/New_York specifies the timezone corresponding to New York, USA.

Geography (GIS) Data Type in BigQuery

Bigquery’s Geography data type is adept at representing geographical entities like points, paths, and shapes on Earth’s surface. Its alignment with the WGS84 reference ellipsoid, which is common in GPS systems, facilitates direct usage of latitude and longitude to pinpoint locations in BigQuery.

Key Functions & Use-cases

Constructors

They create new geographical values. For instance:

plaintext
ST_GEOGFROMPOINT(latitude, longitude)

Parsers

These functions interpret external formats like WKT and GeoJSON and transform them into Geography data types:

plaintext
ST_PARSEGEOG(wkt_string)

Formatters

Acting as the reverse of Parsers, they convert Geography data types into known external formats. For instance:

plaintext
ST_TOGEOGTEXT(geography_value)

Transformations

They derive new Geography data types from pre-existing ones. Consider:

plaintext
ST_COMBINE(geography_1, geography_2)

Predicates

These are evaluators that return TRUE or FALSE based on spatial attributes or relationships. For example:

plaintext
ST_ISWITHIN(geography_1, geography_2, distance)

Accessors

They grant insights into geographical data properties. To retrieve the total number of points in a geographical entity:

plaintext
ST_TOTALPOINTS(geography_value)

Measures

Used for gauging metrics between geographical entities, like:

plaintext
ST_MEASUREDISTANCE(geography_1, geography_2)

Aggregate Functions

They perform collective operations specific to geography data. For example:

plaintext
ST_COMBINEALL(geography)

Array Data Type in BigQuery

In BigQuery, the array data type is your go-to for storing ordered collections of elements. While these collections can have any data type, they should not be arrays themselves. Arrays in BigQuery find representation via angle brackets (< and >).

Let’s look at array declarations

Standard Arrays 

BigQuery accommodates most data types within an array. Examples include:

plaintext
ARRAY<Double 64> ARRAY<String>

Handling NULL

If a query is to return an array with only NULLs, BigQuery will display an error. However, such arrays can be used internally in your queries.

Nested Arrays

Direct nesting of arrays is not permissible, so ARRAY<ARRAY<INT 32>> isn’t valid.

Using Struct With An Array 

To achieve nesting, add a struct. An example:

plaintext
ARRAY<Struct<ARRAY<INT 32>>>

Struct Data Type in BigQuery

The struct data type can have different fields, each with its designated data type and optionally a name. In BigQuery, structs are demarcated using angle brackets (< and >).

Let’s take a look at struct declarations.

Standard Structs

Structs in BigQuery can have a wide range of data types. An example is:

plaintext
STRUCT<INT 32>

Nested Structs

One Struct can reside within another, as shown by:

plaintext
STRUCT<y STRUCT<'a' String, b INT 32>>

Struct with an array

A struct can carry an array:

plaintext
STRUCT<ARRAY<STRING>>

Handling NULL

Structs in BigQuery are receptive to NULL values or fields comprising NULL values.

Moving BigQuery Data In Real-Time With Estuary Flow

Blog Post Image

 

Google’s BigQuery is an advanced tool for managing huge volumes of data. But sometimes businesses have to transfer this data to other systems and doing so quickly and accurately can be a challenge. While BigQuery has tools for this, they might not be the fastest option. This is where Estuary Flow comes into play, making data transfers efficient and timely — without having to worry about maintenance, schema, coding, or the different BigQuery data types.

Estuary Flow enables real-time data movement to and from BigQuery. Flow is our managed data integration platform built for real-time analytics. It lets you move and transform data between databases, warehouses, lakes, and other systems with millisecond latency. It achieves millisecond freshness through change data capture (CDC) from BigQuery. 

Let us explore some of the key capabilities of Estuary Flow:

  • Live monitoring: Offers live reporting and monitoring to ensure uninterrupted data flow.
  • Exactly-once semantics: Guarantees transactional consistency for precise data views.
  • Materializations: Offers low-latency views across desired systems with native real-time operations.
  • Schema inference: Converts unstructured data into structured data for better data management.
  • Powerful transformations: Supports both SQL and JavaScript for real-time streaming transformations.
  • Scalability: A distributed system that can handle data volumes up to 7 GB/s and can backfill large data sets quickly.
  • Optimized data handling: Minimizes system load with low-impact CDC, distributing data across all systems without re-syncing.
  • Data capture: Seamlessly captures data from different sources, including real-time Change Data Capture (CDC) from databases and integrations with SaaS applications.

Key Takeaways

The wide range of Google’s BigQuery data types lets you store and process different datasets with precision. When you pick the right data type, you can enhance data integrity, system performance, and analytical capabilities. For many organizations, the next step is moving BigQuery data to other systems in real-time for additional processing or analytics.

Estuary Flow provides a robust platform for achieving millisecond fresh BigQuery data in downstream tools. Its seamless change data capture from BigQuery combined with streaming ETL enables real-time data pipeline orchestration. 

With pre-built connectors, automatic schema management, data consistency features, and enterprise-grade scalability, Estuary Flow simplifies your real-time data integration challenges. To experience its capabilities firsthand, you can sign up for free today. You can also contact our team for additional assistance in implementing real-time BigQuery data movement.

Start streaming your data for free

Build a Pipeline