Thursday, 23 October 2025

Episode #3 - Eight principles to make agentic AI actually work in the enterprise

Integration, contextual intelligence, continuous improvement, and people-first design separate AI leaders from laggards.
Eight principles to make agentic AI actually work in the enterprise - by Tim Leers
Agentic AI: Entreprise reality
The Eight Principles
From Pilots to Impact

In this article, Tim Leers shares eight principles for deploying agentic AI in enterprise environments: systems that autonomously reason, plan, and act to deliver real business outcomes. Based on dozens of deployments, he explains why most initiatives fail: not due to weak models, but because of poor integration, missing context, and lack of continuous improvement.

The piece highlights the shift from demos to production-ready systems, stressing contextual intelligence, people-first design, and governance. Tim outlines practical steps such as leveraging unique data, architecting for change, and building cross-functional teams, while emphasising human–AI partnership and measuring impact through real operating metrics. Talan and dataroots, a Talan company, help organisations move from pilots to scalable, trusted agentic AI solutions that compound value over time.

Agentic AI: Entreprise reality
 

Most enterprise “agents” don’t fail for lack of intelligence: they fail for lack of integration. Demos impress; production stalls. Answers lack context, workflows break, owners are unclear, and improvement loops are missing. Based on dozens of deployments at Talan, we’ll share 8 principles that separate durable systems from tech demos.

Callout: A simple agentic AI definition
Agentic AI = systems that can autonomously plan and execute multi-step tasks by combining reasoning, context, and feedback loops. Unlike traditional AI, which predicts or classifies, agentic systems act — they retrieve data, call tools, and coordinate actions toward outcomes.

Look at any feed and you’ll see the same message over and over: the next model will change everything; big-tech and frontier-lab CEOs promise near-term AGI timelines; companies announce generative-AI plans and see their stock jump; a new “agentic AI” SaaS promises to reinvent category X.


In practice, most enterprise AI initiatives have stalled. Almost everyone is investing in AI, yet only about 1% of leaders say AI is truly embedded in day-to-day work with meaningful outcomes [1]; more than 90% of generative-AI pilots never reach production [2]. Even when something does reach production, adoption often drops off. Answers lack depth, context is missing, or the system disrupts existing workflows: adding steps, breaking handoffs, or forcing employees to switch to unfamiliar tools. 


At Talan, we’ve observed a common pattern across industries: Stunning demos collapse in production because they don’t actually use the company’s data, don’t fit the process, have no owner, no guardrails, or no improvement loop. Vendors sell prompts as agents, and use the number of agents deployed as a proxy for impact and value created. Yet investment into generative and agentic AI has not provided the anticipated impact. Let’s be clear about what “impact” should look like: a named operating metric that moves in production and stays moved—time to resolution, first-contact fix, fraud losses, cost per claim, days sales outstanding, or even an uptick in EBITDA. 

Callout: Vanity metrics aren’t new

Counting “agents deployed” is the 2020s version of counting computers on desks in the 1980s–1990s.

Back then, executives touted PC installs and server racks, yet productivity didn’t move until organisations redesigned processes, trained people, and integrated systems—the well-known IT productivity paradox.

Today’s AI is the same: deployment counts and “number of agents” are vanity metrics; outcomes are the metric. If you can’t name the operating metric and the user, you don’t have a use case—you have a demo.

image for emphasis; excerpt from a promotion brochure from 1974!
image for emphasis; excerpt from a promotion brochure from 1974!

At Talan we’ve moved from pilots and experiments to systems that compound in value over time once the fundamentals are right. We build for integration and durable value—contextual intelligence, continuous improvement, and people-first design; anchored in workflow fit, clear ownership, and risk-based guardrails.

Here are our eight guiding principles that separate systems delivering meaningful impact from tech demos.

The Eight Principles

Principle 1: Focus on everything but the model

Why agentic AI is different from traditional AI:

In agentic systems, differentiation doesn’t come from the model itself but from how reasoning, data access, and orchestration are wired into your business workflows.

Training or fine-tuning your own models is rarely a worthwhile investment for most enterprises today. Models are improved on a quarterly cadence and get cheaper even faster. Everyone can rent the same model intelligence. More importantly, trying to out-develop a frontier lab like OpenAI is a losing game. 

An abstraction of model performance going up and model cost going down since the release of ChatGPT
An abstraction of model performance going up and model cost going down since the release of ChatGPT

Your competitive advantage sits elsewhere in the enterprise agentic AI stack: the data you can trust, the workflow you redesign, and the way your system turns “call a model” into “get the right outcome.” Think of the model as a raw nerve: powerful, generic, and replaceable. The value is in how you wire it into your business.

As Benedict Evans, venture partner at Mosaic Ventures, points out that generative models thrive where “there aren’t necessarily wrong answers” [3], like drafting marketing copy or writing code that can be tested. In enterprises, most tasks demand deterministic and explainable outcomes, oftentimes a binary answer which is either right or wrong. Think of policy exceptions, regulatory clauses, SKU prices. In those cases, a more powerful model isn’t what will drive impact.

The bitter lesson for enterprise AI is that to scale, you should focus on your ability to leverage the leaps in model intelligence that happen elsewhere. Your differentiator is your ability to benefit from those leaps more than competitors. That means focusing primarily on three layers of the agentic AI stack for most companies today:

  1. Agentic AI: The “agentic” apps you deploy to create business impact & that end-users interact with
  2. Application layer: Apps & services inside your org, (e.g. Salesforce), that agents must integrate with to deliver value and fit in your workflows
  3. Data layer: From conventional data platform operations that agents leverage, to newer needs: standardising knowledge bases, productising unstructured data, building/extending retrieval systems, and managing prompts & benchmarks
Figure 2. The modern AI stack: infrastructure and models have been largely commoditised; enterprise value & differentiation lives in data, application, and agentic layers
Figure 2. The modern AI stack: infrastructure and models have been largely commoditised; enterprise value & differentiation lives in data, application, and agentic layers

Note: Infrastructure isn’t a commodity for every industry 

While cloud and managed services have abstracted much of the complexity for many enterprises, sectors such as finance, healthcare, and defence still face stringent requirements around data residency, compliance, and latency. For them, infrastructure isn’t an optional enabler—it’s an existential foundation. You either build it, or you partner for it, with extreme guarantees around security, performance, and continuity. The ability to run agentic AI safely and at scale depends on this foundation: without it, orchestration, data passthrough, and real-time context simply aren’t possible.

These layers aren’t a strict hierarchy but an evolving picture of enterprise AI maturity. Agentic capabilities orchestrate across application and data layers: reasoning over information, calling tools, and acting within workflows. The stronger your foundations, the more effective your agentic systems become.

Principle 2: Leverage your unique contextual data

What has changed in agentic AI vs. traditional AI:

Agents rely on retrieval and reasoning — without high-quality, domain-specific context, they can’t make decisions that reflect your business reality. That context data spans beyond conventional training data & analytics data.

Large language models are trained on the collective knowledge of our world, including the public internet. That doesn’t mean they will know your policies, prices, or process exceptions. More than ever, garbage in means garbage out. More generally, you need contextual intelligence that can find the right data at the right time: Injecting proprietary data, rules, and benchmarks so the system produces the right answer for your business, not just a plausible one. 

That means investing in pipelines that clean, tag, and refresh documents, logs, and tacit knowledge. Moreover, it means combining the right data at the right time, with the right instructions, and maintaining “golden sets” of prompts (and the full trace of how an answer was generated) matched with correct answers to build your business-aligned, task-level benchmarks.

Every hour spent on data quality, metadata and retrievability will pay far more dividends than an extra hour spent trying to train or finetune a slightly more accurate model. In other words, the value of a model in most use cases is only as good as the data it has access to: value ≈ model intelligence × (relevant data)².

So what does contextual intelligence look like in practice? Mostly, it’s an extension of what you already know, supported by pre-existing data maturity & operations:

  • Governed freshness & lineage: SLAs, access controls, and auditability baked in; no agent should be in charge of deciding who gets access to data
  • On-boarding & applying your best practices to new data sources (e.g. procedures, call notes, tickets, manuals, code, logs); extend the principles of data products to unstructured data and centralise parsing, enrichment, data quality checks and other default pre-processing operations
  • Extending data product thinking to information retrieval; productising unstructured products only goes so far. Clean, structured text is the beginning, but there’s still a need for actually bringing those documents to a person or agent at the right time. Task or question-level optimised retrieval is key, something we’ve started to frame and build as retrieval products; productising the outcome of retrieval systems to have the same type of guarantees & scalability potential as data products.
Figure 2. The modern AI stack: infrastructure and models have been largely commoditised; enterprise value & differentiation lives in data, application, and agentic layers

Note: From traditional to agentic data platforms
Delivering contextual intelligence at scale often requires change inside the data layer. Agentic systems depend on low-latency, credential-aware access (agents act under the user’s permissions), caching, and near-real-time data flows—capabilities most traditional data platforms weren’t built for. In practice, that means modernising toward agentic data platforms: secure passthrough, retrieval-ready datasets, and seamless integration across cloud and on-prem resources. Without this foundation, even the best contextual data strategy can’t reach production reliably.

Without contextual intelligence, agentic AI systems are nice tech demos. With contextual intelligence, they’re actually capable of creating meaningful impact, help save time on routine tasks, and enable transformative change across the organisation. 

Principle 3: Architect for change, not monuments

What has changed in agentic AI engineering vs. traditional AI engineering:

Agentic systems coordinate many components to achieve outcomes, including tools, retrieval, and reasoning. That means agentic systems inherit change from every layer of the agentic AI stack – let alone the quickly changing external context, ranging from ecosystem to user expectations. Design has to anticipate that drift.

Agentic AI systems operate in a landscape that’s evolving at breakneck speed. The question isn’t whether your technology stack will change—it’s how deliberately you’ve designed your systems, and the surrounding teams and processes, to absorb that change.

Two and a half years ago, just after the release of ChatGPT, teams shipped value by building heavy scaffolding around LLMs: brittle prompt chains, custom retrieval logic, home-grown orchestration, and layers of post-processing to close capability gaps. It worked—but several years later, much of that investment has turned from enabler to technical debt.

Since then, the centre of gravity has moved upward. AI has continued to “eat” software; absorbing conventional business logic through new features: Models now handle structured output and function calling more reliably; context windows have expanded; retrieval stacks matured. What required bespoke code yesterday is increasingly available as managed capability today. The result: code that was once essential becomes overhead to maintain.

There’s a trade-off to manage. In high-stakes domains, you still need explicit guardrails and standard operating procedures to guarantee reliability and determinism. But over-investing in hand-crafted “agentic workflows” for every business process anchors you to today’s patterns and tomorrow’s maintenance burden.

The chart shows the divergence in ROI over time stemming from design philosophy: tightly coupled systems deliver early ROI (dashed purple) but decay as the ecosystem advances; decoupled systems (dashed green) compound as components improve. As model intelligence keeps advancing, hard-coding business logic into software contributes less (blue); company-specific context & policy expressed as data/config become the durable source of advantage.
The chart shows the divergence in ROI over time stemming from design philosophy: tightly coupled systems deliver early ROI (dashed purple) but decay as the ecosystem advances; decoupled systems (dashed green) compound as components improve. As model intelligence keeps advancing, hard-coding business logic into software contributes less (blue); company-specific context & policy expressed as data/config become the durable source of advantage.

To keep compounding, architect for forward-compatibility:

  • Decouple business logic from model specifics. Express rules and policies in configuration and data rather than code that bakes in a model’s quirks
  • Consider model-native capabilities once they clear your bar. Adopt function calling, JSON modes, and tool use when latency/accuracy meet benchmarks and unit economics make sense; don’t re-implement what the platform now does well
  • Upgrade by evals, not vibes. Maintain golden sets and task-level benchmarks; when a new model or retriever clears them, adopt and move on.

Do this well and new releases become drop-in improvements. When models, retrieval, or tool runtimes get better, your system benefits immediately because the pipelines, policies, and UX are already built to absorb change. Accuracy, trust, and adoption improve over quarters instead of eroding with each ecosystem shift. An AI initiative is never “done”; it should continuously absorb change.

Principle 4: Continuous improvement is not optional - it's the product

Agents ought to learn from interaction. Every correction, skipped suggestion, or exception is a training signal for the next iteration of the system. Doing this right means building a product that people want to use, and a system with compounding ROI. 

So what’s different in agentic vs. traditional AI?

MLOps covers the conventional model lifecycle, from data collection to training and deployment of model artefacts. “AgentOps” or LLMOps extends far beyond the model artefact, into systems engineering, frontend design, UX and thus the product lifecycle. 

On top of that, agent behaviour is often ill-defined, meaning that to meet user expectations, one must continuously improve performance or add guardrails to disable undesirable behaviours and outcomes.

Most AI systems are shipped as if launch were the finish line. In reality, launch is day one of a feedback loop. Without that loop, the same mistakes repeat, adoption stalls, and the system fades into irrelevance. With it, accuracy compounds, trust builds, and usage spreads.

Agentic systems are deployed into real workflows, not static sandboxes. That means real-world usage immediately diverges from initial expectations: users edit drafts, skip suggestions, override outputs, escalate edge cases; tools fail; policies change. And yet, many agentic AI systems are still built with a “deploy and forget” mindset: as if the model & agentic workflow were the product. But in practice, the model improves on its own; your system won’t, unless you build the loop that makes it happen.

That loop turns usage into evaluation, and evaluation into decisions. What people accept, reject, or escalate becomes input for a suite of task-level benchmarks—canonical scenarios with expected outputs that reflect how the system is meant to perform. These are used as release gates: the checks that define whether a change improves or harms production quality. Crucially, these benchmarks should not only be owned by AI teams: Data, domain experts and operations teams need to be involved.

A simple blueprint of the continuous improvement lifecycle operationalised across the agentic AI stack: Engagement products at the top instrument how the system is used; benchmark infrastructure at the bottom evaluates performance. The loop runs vertically through the stack—across agents, apps, and data—turning real usage into improved performance.
A simple blueprint of the continuous improvement lifecycle operationalised across the agentic AI stack: Engagement products at the top instrument how the system is used; benchmark infrastructure at the bottom evaluates performance. The loop runs vertically through the stack—across agents, apps, and data—turning real usage into improved performance.

Done right, this becomes a standing product capability: a weekly or bi-weekly cadence of analysis, patching, re-evaluation, and release. Importantly, fixes don’t just land in the model—they show up in retrieval, prompts, tools, data, and UX. The same evaluation gates keep it all aligned. And just as in mature engineering, those gates guard both correctness and regression.

This is systems engineering, not just MLOps. It demands tighter cross-functional collaboration, clearer ownership, and design for long-term maintainability. Without it, your system plateaus. With it, performance compounds. So: Don’t just train the model train the system. The winners treat improvement as a built-in product feature, not a quarterly fire drill. That’s what turns an impressive demo into a durable product.

Principle 5: Invest in people and process, not just technology 

While this principle holds true for traditional AI; it’s even more important for agents:

Agentic AI systems span data, tools, and workflows. That makes them valuable—but it also means no single team can succeed alone.

We’ve seen the pattern: engineering teams build impressive demos, but nothing scales because data lives elsewhere, workflows don’t adapt, and no one owns the system once it’s live. Agentic AI cuts across silos. It needs data engineers to structure knowledge, application owners to wire workflows, and subject-matter experts to define what “good” looks like. If any of those pieces is missing, adoption dies.

The solution is cultural, not technical. Organise around outcomes. Build cross-functional squads that own AI systems end to end: data, workflow, engineering, and product. The team that ships is the team that improves. Ownership must span both the build and the loop.

Sometimes it means reorganising how we deliver value across an entire process or function. The fix is not complicated but it is cultural. Create cross-functional squads for each AI initiative: one person who knows the data, one who owns the workflow, one who can ship software, one who speaks for the end users. Give them clear ownership of both launch and continuous improvement. Don’t bury AI inside IT or platform teams. Organise around real workflows and business outcomes.

Eight principles to make agentic AI actually work in the enterprise

The figure provides a canonical anatomy of a functioning agentic AI operating model: four core capabilities—data, retrieval, engagement, and improvement—each with a clear role , set of owners and contract between teams. When these teams stay siloed, systems fail to ship or stagnate after launch. When integrated through end-to-end squads, the system improves week by week.

Leadership matters here too. Pilots plateau when executives treat AI as an experiment rather than a capability. At the same time, we’ve seen AI pilots derailed not by feasibility, but by incentives. When C-level goals focus on launching “20 use cases” by year-end, teams may chase checkboxes rather than impact. The organisations that move fastest are the ones where leadership sets priorities, prioritises KPIs that are tied to real operating metrics, removes barriers, and commits resources to scaling what works.

The result is simple: systems with clear ownership, embedded in real workflows, and continuously improved by the people who actually use them. Without this, you don’t get enterprise AI. You get prototypes in a slide deck.

Principle 6: Empower users and institutionalise human-AI partnership

Agents are deployed everywhere, where users work and where business impact is generated. That means more than ever, we must not forget how people interact with AI.

Agentic AI works best when humans delegate judgment safely — the interface must make collaboration, not blind trust, the default.

Even the best AI system fails if no one uses it. Adoption is where most projects collapse—not because the model is weak, but because the interface is. A blank chat box forces every user to reinvent the wheel. A good UI does the opposite: it crystallises what the organisation has already learned about what works, so no one has to start from scratch.

A blank chat box is sometimes better replaced with a simple interface, tailored to a specific persona and/or  task
A blank chat box is sometimes better replaced with a simple interface, tailored to a specific persona and/or  task

Think of Excel. Personal computing only went mainstream once interfaces like menus, templates, and formulas encoded best practice so that anyone could use it. Enterprise AI needs the same. Instead of “just prompt it,” provide structured flows that guide the task: a proposal generator with fields for client name, goals, and product, followed by “Generate Draft.” That interface doesn’t just reduce friction; it encodes institutional knowledge directly into the workflow.

Beyond interface, adoption depends on trust and fit. Users should see sources inline, be able to edit outputs, and know that edits feed back into the system. Over time, the assistant learns: it remembers the projects someone is working on, tailors drafts to their context, and improves with use. The more it feels like “my tool,” the faster adoption spreads.

Training and change management still matter. People need to know what the AI can do, what it can’t, and how to work with it. Sharing success stories and creating communities of practice accelerate the curve. And sometimes the right move isn’t inserting AI into today’s workflow but redesigning the workflow itself—offloading rote checks to the machine, reserving judgment and exceptions for humans, and giving both better dashboards in the process.

The end state won’t be “AI replaces humans” and shouldn’t be “humans wrestle with AI.” It’s partnership. People stay in control; AI makes them faster, more consistent, and more focused on what matters. Systems that achieve this become invisible—just part of how work gets done.  

AI that acts like a partner—predictable, auditable, improving with use—earns trust. Adoption follows when people stop thinking of it as a separate tool and start thinking of it as part of how work gets done.

Principle 7: Build trust, transparency, and governance from day one

How it’s different from traditional AI: 

It isn’t. The same holds true. The technical and organisational challenges are just less well-defined, and likely even more significant.

Agentic systems act autonomously; that autonomy requires new layers of observability, auditability, and human override to maintain accountability.

Adoption follows trust. Users won’t rely on a system they can’t verify, leaders won’t fund one they can’t govern, and customers won’t accept one that feels like a black box. Trust isn’t a feature you bolt on later; it has to be designed in from the start. 

That means transparency in the interface—showing sources, exposing rationale, making escalation obvious. It means control—tying autonomy to risk, protecting data end-to-end, and scheduling regular evals for correctness, safety, and bias. And it means auditability—logging prompts, sources, versions, and approvals so you can explain not just what the system said, but why.

Just as important, trust comes from discipline. Treat AI programs like any other major initiative: define KPIs, measure rigorously, and report impact. Time saved, error reduction, CSAT uplift—when results are visible, confidence grows. With transparency and governance in place, AI shifts from “shiny demo” to “trusted partner.”

Principle 8: Experiment widely, scale deliberately

How it’s different from traditional AI:

Agentic systems make experimentation faster than ever—ideas become working prototypes in days. The conventional AI training lifecycle that can take weeks to months can be circumvented. The differentiator in agentic isn’t who experiments, but who scales with purpose, measures real outcomes, and keeps improving once systems are live.

The future of AI is too fluid to plan your way to certainty. You have to learn by doing. That means running many small, cheap experiments—and killing most of them. The point isn’t to build monuments; it’s to discover, fast, where AI can really move the needle.

Winners then earn the right to scale. Before graduation, pass the hard checks: data integration, access controls, eval coverage, UI fit, ownership, observability. Roll out progressively—from shadow use, to teams, to business units. Keep a living registry of use cases, metrics, and failure modes, so each new deployment gets cheaper and smarter.

Moreover, generative & agentic AI have made prototyping much more effective. No longer do we need experimentation cycles that last several quarters, or product requirement documentation that needs to be validated by each stakeholder before work can begin.

The typical AI project lifecycle
The typical AI project lifecycle

The time to market for a prototype that allows you to see if an idea holds merit has become infinitesimally smaller than before ChatGPT was a thing. Yet most companies are still experimenting as if it’s 2020. 

Leverage generative & agentic AI to be creative; organise hackathons; inspire, and let everyone contribute to what success could look like. Illustrated below: One use-case discovery session can inspire many proof of concepts, or vice versa. What used to cost weeks or months of development time can now happen in a few days of intense collaboration and AI-assisted prototyping. The challenge has shifted, even more, into moving from pilot to production. And the scarce resource now, is simply the attention of engineering teams and stakeholders, overwhelmed by vendors and social media about the seemingly boundless potential of this new technology.

The “agentic” lifecycle is much more dynamic, and can be shaped in function of the organisation, allowing faster and wider exploration
The “agentic” lifecycle is much more dynamic, and can be shaped in function of the organisation, allowing faster and wider exploration 

The discipline is in the balance: encourage creativity at the edges, but scale with rigor. Organizations that do this don’t end up with dozens of half-finished proofs of concept. They end up with a portfolio of real systems that compound value over time.

From Pilots to Impact 

The noise around AI will keep shifting: new models, new benchmarks, new promises of AGI just around the corner. But durable impact isn’t found in the headlines. It lives in the slow, often seemingly unglamorous edge: your data pipelines, your workflows, your people, your ability to measure, and your discipline to improve every week. If you can name the metric, the user, and the owner and if that metric moves in production and stays moved: you have real value. If not, you have a demo.

Enterprise AI isn’t about chasing the next release; it’s about building the systems, practices, and teams that can absorb every release and turn it into meaningful change. Do that, and each model upgrade compounds your advantage instead of resetting your roadmap.

If these principles resonated with you, I recommend watching our webinar that covers some of these principles in more detail.

Book a meeting with me and Talan's AI team. We help our clients develop AI strategies that are both impactful and responsible.

Agents IA autonomes et expérience client B2B avec IA agentique

Discover Episodes #1 & #2 of Talan's mini-series on agentic artificial intelligence

TBD

Sources

Tim Leers – Global Generative & Agentic AI Lead​ at dataroots, a Talan company


References 

[1] McKinsey (Jan 28, 2025): Superagency in the workplace — only ~1% at AI maturity. McKinsey & Company+1

[2] CIO.com / IDC–Lenovo (Mar 25, 2025): 88% of AI pilots fail to reach production. CIO
[3] Benedict Evans (Jan 22, 2025): “Are better models better?” — right answers vs. better answers. Benedict Evans