How to Export Data From Firestore: Top Formats & Destinations
Google Cloud Firestore is a top-tier NoSQL database for real-time operations. But because of its unstructured nature, it’s challenging to extract or export data from Firestore. This can put a serious damper on your data strategy.
Some of the very features that make Firestore so great make it hard to migrate or sync Firestore documents to other systems. And the fact that it’s part of the Google Cloud ecosystem puts your data in a bubble.
In this post, we’ll run through the most popular file formats and destination systems for your Firestore data. We’ll cover methods from one-off exports to real-time synchronization. And we’ll link out to in-depth resources for each method.
Why Use Firestore? (And When to Migrate Data Out)
Let’s start with the basics, shall we?
If you’re not already using the database for your app or project, you might be wondering who should use Firestore to begin with, and why you’d want to export data from Firestore.
My previous post on analyzing Firestore data goes into this in detail, but we can sum it up as follows:
- Firestore is a NoSQL database with a loose data structure that is great for building scalable applications with fast indexing. (Think mobile applications with features like chat.)
- Firestore is not a relational database, so it doesn’t offer the data structure or security features of most RDBMS.
- Firestore is also not an analytical storage solution, so it’s a poor choice for aggregated analytical workloads, data science, BI, and the like.
So, it’s possible you want to completely migrate your database from Firestore due to a change in priorities.
But more likely, Firestore is serving you quite well for a singular purpose, but you also need to do other things with the same data. For that, you need that data to be available in different storage systems.
In other words, you need your data stack to be integrated around a single source of truth.
But it’s not so simple.
Firestore’s unique features include:
- Document-oriented data storage.
- An extremely flexible data model.
Firestore documents are sets of JSON-esque key-value pairs. They don’t have schemas, and they support complex nesting.
This means you can’t simply grab an orderly table out of Firestore and be off to the races.
You have to get creative.
Without further ado, here are some of the most common Firestore-to-destination data pipelines you might need, and how you can set one up using mostly no-code methods.
Firestore to BigQuery
Why export data from Firestore to BigQuery?
You have an app running on Firestore, and you also want that data in a data warehouse for analysis. Since it’s also in the Google ecosystem, BigQuery is by far the most common data warehouse choice in this scenario.
How to do it:
Method 1: Google provides a two-step batch process.
- Export from Firestore with the managed import and export service.
- Load exports into BigQuery.
Firestore can only export datasets into leveldb logs — a type of open-source key/value log that originated at Google and is rarely used elsewhere.
That makes it hard to upload raw Firestore exports anywhere other than BigQuery.
Method 2: Use a real-time pipeline provider.
Estuary’s cloud-based data integration platform, Flow, can capture data change events from Firestore and move them to BigQuery in real time.
Flow stores your Firestore documents as regular JSON (more below). You can push that data just as easily to a different (non-Google) data warehouse, like Snowflake.
- How to create any Data Flow: an adaptable tutorial you can use to connect Firestore to BigQuery.
- Full post on connecting Firestore to Snowflake using Flow.
Firestore to JSON
Why export data from Firestore to JSON?
JSON is an excellent intermediate data format, so it’s a great choice if you want to leave your options open or move your exported Firestore data to various destinations.
JSON is also the most similar data format to Firestore documents. With JSON, you get a minimally-changed copy of the Firestore data that can be consumed in other applications.
How to do it:
Method 1: Real-time data pipeline to cloud storage using Estuary Flow.
As mentioned above, Estuary Flow is a managed data pipeline tool that allows you to capture real-time updates to Firestore collections.
Your Flow data is stored as JSON in a cloud storage bucket that you own. You can push that data to another system, like a data warehouse, or access it directly in your bucket.
Learn more about how Flow models Firestore data into JSON.
Method 2: Local export using Firefoo
Firefoo is a GUI tool to import, export, and explore Firebase data on your local machine. You can choose between traditional nested JSON; and flattened, newline-separated JSON.
This is a batch-based workflow that must be run at repeat intervals to stay up-to-date.
Firestore to CSV
Why export data from Firestore to CSV?
Like JSON, CSV is another extremely versatile and widely-used data format.
If your destination is a spreadsheet, like Google Sheets or Microsoft Excel, you’ll of course use CSV. CSV can also be a quick and easy intermediate file for smaller Firestore collections.
How to do it:
Method 1: Real-time pipeline to Google Sheets using Estuary Flow.
Flow offers a path for this, as well. Create a real-time Data Flow from Firestore to Google Sheets, and export the sheet as CSV as needed.
Method 2: Local export using Firefoo.
This is exactly the same method described above for JSON: Firefoo allows you to export Firestore collections as CSV in a batch fashion.
Firestore to PubSub
Why export data from Firestore to Pubsub?
Google Cloud PubSub is an event bus commonly used to ferry messages between applications and services in real time. It’s especially useful within the Google ecosystem.
Say you want a data change event in your Firestore-backed app to trigger another action in a separte app built using some other framework. PubSub can help connect the two, if you have the engineering skills.
How to do it:
Method 1: Google Cloud Functions.
Cloud Functions is Google’s serverless environment for connecting its various cloud services in just about any way you want.
Set up a Cloud Firestore data change as a trigger. Then, write an event-driven function in response to that trigger that writes to a PubSub topic.
Method 2: Use Estuary Flow as an intermediary.
Like PubSub, Estuary Flow is an event-based platform that moves data in real time. Unlike PubSub, you don’t need to set up triggers or write code.
To connect Firestore to PubSub with Flow, simply create a new Data Flow. Choose Firestore as the source and PubSub as the destination.
Getting Data From Firestore in Real Time: Performance Considerations
Hopefully you’re feeling prepared to start moving data from Firestore to your destination or file type of choice.
But before we wrap up, it’s important to discuss the performance — and by extension, the cost — of the methods we discussed.
Some of these methods are either batch methods or repeat exports of entire collections. This means that each time you go to get the latest data from Firestore, even if only one document has changed, you have to read the entire collection or database to figure that out.
Firestore is billed based on read operations. And every exported document counts as a read operation.
As you can see, this can add up really fast for large datasets, or if you need to refresh your data export frequently.
When you use an event-based, real-time method to extract data from Firestore, you only read new or changed documents.
So, the advantages of real-time data export from Firestore are twofold:
- Extremely low latency: Changes in Firestore are shown at the destination in seconds or milliseconds.
- Dramatic cost savings: Since you’re not reading your entire database regularly, you can save hundreds or thousands of dollars each year, depending on the database size.
On the other hand, a batch or repeat export might be fine if:
- Your Firestore database is small.
- You’re exporting once, or infrequently.
There are endless possibilities for where you might want to move your Firestore data. Today, we covered just a few.
For a few more, check out Estuary Flow’s supported destination systems — you can push Firestore data to any of these using the same method described throughout this article.
Create your first Data Flow by signing up here — no strings attached.
And as always, let us know your thoughts, questions, and requests in the comments below or on Slack.
Keywords: bigquery, csv, firestore, json, pubsub