
What are AI Agents?
Throughout history humans have looked for tools and techniques to simplify work and improve efficiencies. An AI agent is software that can act on its own, driven by data. AI agents are the frontier of automation efforts, with many use cases for knowledge workers.
AI agents driven by Large Language Models (LLMs) are particularly adept at dealing with digital data requiring interpretation and a level of reasoning. You may have used or built AI agents with technologies such as OpenClaw, Claude Code or Cowork, Workspace Agents in ChatGPT or Gemini Agent. Companies want to leverage AI agents to improve productivity, starting with knowledge workers.
What does Generative AI do?
At its foundation, the goal of Generative AI (GenAI for short) is to predict a sequence of tokens. For example, for text-based scenarios, GenAI will predict fragments of words to form a sentence. Since the introduction of ChatGPT in November 2023 LLMs have significantly improved in quality. However, even though GenAI sentences are generally well-formed, without spelling errors and sound confident, the output may still be wrong. This is called a hallucination.
Hallucinations must be avoided when you consider LLMs driving automation through agents. Humans rapidly lose trust in automation when outcomes are counter-productive.
GenAI is driven by data. The LLMs powering generative AI are trained on all the relevant data the (LLM) publishers can get their hands on, starting with all data on the internet. Industry or task-specific models are trained/tuned with domain data.
LLMs are however static. Chat interfaces (e.g. Claude.ai, ChatGPT, Grok) augment their responses with internet search in order to provide a current response as needed and lower the likelihood of hallucinations. Likewise, GenAI-driven agents must consider company data.
How to include company data in GenAI
Common strategies to include company data into GenAI are:
- Fine-tuning: Derive a new (static) LLM taking into consideration additional data you provide. To successfully fine-tune an LLM you must have a large set of well-labeled, clean data. And, you must redo the tuning each time you want to rebase your model on one of the latest - ever improving - models. Fine-tuning is complex and can be very costly.
- Retrieval Augmented Generation (RAG): Feed the query a set of facts it must consider when responding. RAG is more dynamic than fine tuning, but still has architectural complexities.
- Provide all relevant context as part of the interaction: Context windows of models have grown significantly, with leading models supporting context windows of a million tokens or more. Depending on the data volume, and the cost to retrieve it, feeding relevant data into the context window can lead to good results.
- Function calling/tool use: Allow GenAI access to a queryable data set (e.g. access to a database, file system or API) and have it decide when and what to retrieve. With AI agents becoming smarter and more independent, thanks to ever-improving LLMs, and MCP (Model Context Protocol) as an open standard, the function calling/tool use approach is rapidly gaining popularity.
Data Access
Better data should lead to better decisions. That consideration also applies to AI agents. What is the best data source for company data? For most organizations and many use cases, the best data source for AI agents is the data warehouse (or data lake, or lake house).
The data warehouse provides access to consolidated data across multiple data sources, each of which may be a data silo depending on the use case. In contrast to source systems that are often protected from too frequent access, e.g. through rate limits, the data warehouse was designed for scalable data access. Besides, not every source system provides API access.
Add that different systems require separate authentication, and each time you add or change a data source you'd be forced to adjust your agent if you only queried source systems. Finally, with occasional horror stories about AI agents performing unintended operations it may just feel less risky to build agentic processing on top of the data warehouse.
Fresh Data
If you are building AI agents with access to the data warehouse there is just one more thing to worry about: fresh data. To minimize the likelihood of hallucinations you MUST ensure data is fresh. How can an agent be most successful with access only to yesterday's data? Or even data that is an hour or several minutes old?
You want the AI agent to have access to the most up-to-date data you can provide. For your data warehouse, you want to leverage Change Data Capture (CDC).
CDC identifies data changes. For example, most transaction processing databases record data changes in a log file (e.g. PostgreSQL’s Write-Ahead Log (WAL)). CDC captures these changes and sends or applies them downstream. Likewise, many APIs provide access to new or changed data.
Compared to periodic batch processing, CDC results in no potentially disruptive bursts of activity on your source or target. And, with continuous CDC you enable close to real-time data access, enabling fresh data in the data warehouse or data lake.
Conclusion
Company executives feel pressured to leverage AI for improved productivity and lower overall costs. AI agents provide a path to unlock productivity gains.
Why do AI agents take incorrect actions? Ingredients to successful AI agent implementations include access to up to date company data. Data freshness is often overlooked. And lack of access to up to date relevant information may lead to hallucinations and result in incorrect actions.
It is not the model.

About the author
Mark Van de Wiel was the Field CTO at Fivetran from 2022 through 2025, where he guided enterprise customers in optimizing their data integration strategies. Mark joined Fivetran in 2021 through the acquisition of HVR, where he led US operations and played a key role in scaling the business. His prior experience includes technical leadership roles at Oracle, Actian and GoldenGate Software. Today, Mark is building software using the latest and greatest AI provides.





