Tandem lets you build your own in-app AI agent and copilot. It lives inside your product interface so users can prompt your app to do actions, ask questions, and get explanations — a new way to use your product.

What use cases does Tandem support?

Tandem supports user onboarding, activation, feature adoption, product usage, support ticket deflection, revenue expansion, trial conversion, and self-service success. For internal tools: employee onboarding, software training, IT self-service, and process compliance.

How do I add AI to my product without rebuilding?

With a single JavaScript snippet. No backend changes or API integrations. Navigate to any page, click to place an AI agent. Non-technical teams configure and deploy without engineering.

What can the in-app AI agent do?

It understands natural language, executes actions like clicking buttons and filling forms, navigates anywhere in your app, drives multi-step processes, triggers proactively, and resolves support questions in context.

How does Tandem reduce support tickets?

The AI agent resolves issues in context before they become tickets, handles complex questions, and delivers consistent help across every touchpoint. Customers report up to 70% ticket deflection.

Yes. SOC2 Type II certified, GDPR compliant, and AES-256 encrypted for enterprise-grade security.

Blog

Building Custom Conversational AI vs. Sierra: Engineering Hours & Maintenance Reality

Apr 24, 2026

Building Custom Conversational AI vs. Sierra: Engineering Hours & Maintenance Reality

Christophe Barre

co-founder of Tandem

Share on

On this page

No headings found on page

Building custom conversational AI takes 12 to 18 months and costs $367K to $476K annually while activation problems persist unresolved.

Updated April 24, 2026

TL;DR: Building custom AI to close that gap consumes 12 to 18 months while the activation problem stays open. Sierra offers strong multi-channel customer experience AI across chat, SMS, and voice. Tandem is purpose-built as an AI Agent that sees what users see inside your product and executes workflows directly, deploying via a JavaScript snippet in under an hour. For complex multi-step workflows like account aggregation at Qonto, feature activation doubled from 8% to 16%, representing 100,000 activated users without additional engineering time. Self-serve account activation rose 20% at Aircall, while Sellsy saw an 18% activation lift.

When product teams hit a 15% workflow completion rate, the instinct is to build custom AI that can guide users through the complexity.

What most teams discover six to twelve months later is that the AI works in controlled demos, but activation rates haven't moved, because building reliable in-product execution is far more complex than any initial architecture review reveals.

This guide breaks down why custom conversational AI consumes engineering resources without reliably solving activation, what Sierra and Tandem each deliver for different use cases, and how to make a build vs. buy decision your board can defend with real numbers.

Evaluating build vs. buy for conversational AI

Conversational AI covers a wide range of problems: multi-channel customer support, internal workflow assistance, in-product onboarding, and virtual agents. Before committing to a build, it helps to separate the parts of this stack that differentiate your product from the parts that are commodity infrastructure any vendor has already solved.

Why activation stays broken during custom AI builds

Most product teams start building custom AI with a clear goal: to help users complete complex workflows that currently have a 15% completion rate. Industry data shows only 5% of users complete multi-step walkthroughs, and 36 to 38% of SaaS users activate successfully, leaving the majority of trial users churning before they reach first value.

The challenge isn't the AI model. It's the execution layer: safe action sequencing, context preservation across multi-step workflows, and graceful recovery when actions fail or users deviate from expected paths. These are separate engineering projects, and teams consistently underestimate how long they take.

Enterprise AI projects typically take 12 to 18 months from initiation to production deployment, a timeline reflected across comprehensive rollouts that include assessment, pilot development, and scaling phases. That's not a worst-case estimate for under-resourced teams. It reflects experienced engineering organizations building AI infrastructure with real users and real edge cases.

What activation lift actually requires

Even a modest improvement in activation translates directly into new ARR without additional acquisition spend, and that's the business case for in-product AI. But achieving it requires AI that understands user context, knows where they are in a workflow, and can explain a concept, guide the next step, or execute a repetitive action depending on what the user actually needs.

A chatbot that reads help docs and generates text responses doesn't deliver this. Neither does a custom build, still in month nine of a six-month timeline. The onboarding metrics guide quantifies what each percentage-point improvement in activation means for recovered ARR, making the opportunity cost of delayed AI investment concrete.

Sierra vs. embedded agents: different use cases

Sierra is a conversational AI platform built for enterprise customer experience teams, designed to handle inbound customer inquiries across channels without human intervention. Its agents connect with CRM, order management, and knowledge base systems to handle inquiries across chat, SMS, WhatsApp, email, and voice. Sierra's agents are designed to resolve customer inquiries without human intervention, which is valuable for multi-channel support use cases.

Tandem solves a different problem. It lives inside your product, sees the user's actual screen state, and can explain, guide, or execute based on what the user is trying to accomplish in the moment. For example, when a user needs to understand how a feature works, Tandem explains the concept.

When they need to complete a multi-step workflow, such as integration configuration, Tandem can guide them through each step or handle the repetitive parts. Sierra's multi-channel support architecture isn't built for these in-product moments. Tandem is.

These are not competing solutions for the same buyer. Sierra serves customer experience teams managing inbound volume across channels. Tandem serves product and growth teams trying to lift trial activation and feature adoption inside the product itself.

Initial build phase: engineering hours deep dive

Understanding the true cost of a custom AI build starts with mapping the timeline and team requirements in detail. Here's a realistic engineering breakdown for organizations evaluating whether to build in-house or adopt a purpose-built platform.

Engineering hours: initial AI build

Enterprise AI implementations typically require 12 to 18 months for comprehensive rollouts. This timeline includes four to six weeks for assessment and scoping, three to four months for pilot development, and six to eight months for scaling to production. The consistency of this pattern across organizations makes it a planning baseline rather than an edge-case risk.

Most teams targeting a functional V1 follow an eight-month development arc that looks like this:

Weeks 1-4: Scoping, model selection, architecture decisions, and environment setup.
Months 2-3: Core conversational logic, RAG pipeline, and initial prompt architecture.
Month 4: UI integration layer and action sequencing scaffolding.
Months 5-6: Internal testing, edge case handling, and monitoring setup.
Months 7-8: Closed beta with real users, reliability fixes, and escalation path wiring.

By month eight, the typical outcome is a system that successfully handles the majority of user sessions but still requires at least one dedicated sprint per month to address newly discovered failure modes.

The team you actually need

Building a viable custom AI solution requires three distinct engineering specializations, and these skill sets rarely overlap in a single hire:

ML/AI engineer: Model selection, prompt architecture, RAG pipeline design, evaluation framework, and response quality monitoring.
Frontend specialist: UI integration layer, action sequencing for form fills and multi-step navigation, and browser compatibility.
Backend/orchestration engineer: Tool-calling frameworks, API integrations, state management across multi-turn conversations, and error-handling logic.

Most teams begin by distributing this work across two senior engineers, only to discover mid-project that the UI interaction layer alone demands a dedicated resource. This realization is often the first point where timelines begin to slip. When you're building in-app AI agents, this challenge runs especially deep for product teams weighing their options.

The activation opportunity cost

Each engineer-month invested in building AI infrastructure is an engineer-month diverted from shipping the product features that directly drive activation and revenue. During the months your team spends constructing the execution engine, the activation gap remains unaddressed. The product adoption guide for technical builders explores why this timeline consistently catches even experienced engineering leaders off guard, despite their track record shipping complex systems.

Production reliability challenges

Once your custom AI reaches production, two distinct classes of reliability problems emerge. The first is the gap between controlled demos and real-world user behavior; the second is the ongoing disruption caused by upstream model provider changes.

The demo-to-production gap

The demo-to-production gap is a measured phenomenon: near-perfect performance in controlled testing environments drops significantly once real users introduce unexpected inputs and edge cases. The onboarding mistakes guide maps the most common failure patterns that emerge during this transition.

For activation-focused use cases, this reliability gap directly impacts trial conversion, time-to-first-value, and feature adoption rates. The last 10% of reliability, handling the long tail of edge cases and unhandled state changes, requires a disproportionate share of the engineering investment to close.

Model updates and production stability

Model providers ship API updates on unpredictable schedules. LLM applications routinely experience service disruptions during major model updates, requiring prompt audit cycles, regression testing, and staging validation before changes reach production. When a provider deprecates a model or changes its tool-calling schema, every custom orchestration layer built against that API must be rewritten before the deprecation deadline.

Tandem addresses production stability through an architecture that automatically adapts to changes, so product teams can focus on content quality rather than technical fixes. The Tandem vs. CommandBar comparison explains how this works in practice.

AI project economics: the real numbers

Understanding the full financial picture requires looking beyond initial engineering costs to the ongoing work, retention risks, and opportunity costs that accumulate over time. The real economics of custom AI builds compared to vendor platforms like Sierra and Tandem give your team the complete cost visibility needed to make a defensible decision.

Ongoing work: what all platforms require

All in-app guidance platforms, including Sierra, Tandem, and every other vendor solution, require continuous content work: writing playbooks, updating targeting rules, and refining experiences as your product evolves, regardless of whether you build or buy. The distinction from a custom build is that technical work also falls entirely on your engineering team, rather than being handled by the vendor. The 90-day CX transformation guide outlines how product teams reclaim engineering capacity by shifting to purpose-built platforms.

Engineer retention risk

Top engineers take roles at product-stage companies to build differentiated capabilities, not to maintain infrastructure on a rotating schedule. The "glue work" problem in AI maintenance is well-documented in engineering practice: engineers are pulled into ongoing reliability work that organizations often don't recognize as promotable, creating a retention risk that compounds over time.

Total ownership cost: Sierra vs. build

The table below compares the three paths across the dimensions that matter most to engineering and finance leaders evaluating this decision. Each row reflects realistic planning figures, not best-case projections.

Dimension	Custom build	Sierra	Tandem
Initial cost	~$300K (2 engineers x 6 months)	Enterprise contract	JavaScript snippet, under 1 hour
Time to production	6+ months minimum	Professional services engagement	Days
Ongoing technical work	Engineering-owned	Vendor managed	Vendor managed
Execution capability	Tailored to scope	Multi-channel CX actions	In-product UI execution
Primary use case	Whatever you scope	Customer experience, multi-channel	In-product activation and adoption
Activation impact	Delayed 6+ months	Support resolution focus	18-20% activation lift (Aircall, Sellsy)

Sierra's implementation involves professional services engagement, system integration scoping, and multi-channel deployment configuration across CRM and order management systems. This typically takes several months for enterprise implementations. Tandem requires a JavaScript snippet added to your application in under an hour. After that, product teams own all configuration through a no-code interface, with no engineering involvement required. At Aircall, the team was live within days.

Why custom conversational AI projects fail

Most custom AI projects don't fail because of the model , they fail because of scope expansion and delayed switching decisions. Two patterns account for the majority of stalled builds.

Managing custom AI scope bloat

"Just a simple chatbot" is the most dangerous phrase in an AI project kickoff. Simple chatbots answer questions. What product teams actually need is an execution engine that navigates multi-step workflows, handles authentication flows, fills multi-field forms, and recovers gracefully from user corrections mid-workflow. Each capability is a separate engineering project, and the scope expands naturally as stakeholders see early demos and request the next feature.

The experiences page shows the full range of in-product responses, from guided explanations for users who need context to understand a feature, to direct execution for users who need a workflow completed, compared to what most internal builds ship in month eight.

When to evaluate a vendor switch

The right time to evaluate a vendor switch from an in-progress custom build is before the team reaches month six with an unstable V1. The investment already spent in the first six months is real, but it is smaller than the ongoing cost of two to four engineers allocated to technical work indefinitely. A useful signal to watch is the ratio of feature sprints to infrastructure sprints: teams that find infrastructure work consuming significant capacity may see the compounding cost of continuing to build internally exceed the vendor cost within 12 months for most team sizes.

When custom AI drains development capacity, the activation gap that motivated the project in the first place stays open. Advanced feature adoption sits at 10 to 15% despite months of development investment, and this number does not improve when the engineering team building those features is splitting attention between new development and AI infrastructure work.

Making the build vs. buy decision

The right framework for this decision starts with your use case, your data requirements, and your team's capacity, not with the AI itself. Three factors determine the right answer: the conditions that justify a custom build, the compliance baseline for any vendor you evaluate, and the activation opportunity cost specific to your product.

Conditions for a custom AI build

A custom build makes sense in specific conditions: your AI requires proprietary model fine-tuning on confidential datasets that cannot leave your infrastructure. For example, a healthcare platform training on de-identified patient records, or a financial services product building risk models on transaction data governed by internal data residency policies.

Your use case is tightly coupled to internal systems with no vendor integration depth comparable to what you need, or your team has existing AI infrastructure that a vendor solution cannot extend without replacing it wholesale. Outside these conditions, you are rebuilding commodity infrastructure at a cost that compounds each year.

Security and compliance baseline

Whether you are evaluating Sierra, Tandem, or any other AI vendor handling your product data, look for SOC 2 Type II certification, GDPR compliance documentation, and AES-256 encryption as the standard for B2B SaaS environments. Non-compliance with GDPR creates contractual liability and procurement delays that slow vendor onboarding, making compliance validation a standard requirement before any AI vendor handling EU personal data reaches production.

Tandem is SOC 2 certified and GDPR compliant, which removes this build from your internal engineering scope entirely. The digital adoption platform guide outlines what to evaluate when assessing vendor security posture.

Calculate your activation opportunity

Before your next planning cycle, measure two numbers: your trial-to-paid conversion rate and the percentage of users who complete core workflows within seven days. If conversion is below 20% and workflow completion is below 40%, the activation gap is costing you more in lost revenue than a purpose-built solution would cost in subscription fees.

When Qonto deployed Tandem, the results went beyond percentage gains; 100,000 users discovered and activated paid features, with account aggregation activation doubling from 8% to 16%. The same pattern held across different product categories: self-serve account activation rose 20% at Aircall, and Sellsy lifted activation by 18 percentage points. Measure those outcomes against your own ACV and monthly signup volume to see what the ROI case looks like for your specific metrics.

If your custom build is absorbing sprint capacity without moving activation, schedule a demo with Tandem to see how quickly a contextual AI Agent deploys in your product and what activation lift looks like in your specific use case.

FAQs

How long does Tandem take to deploy compared to a custom AI build?

Tandem's JavaScript snippet installs in under an hour with no backend changes required, and product teams configure playbooks through a no-code interface within days. A custom AI build takes 12 to 18 months to reach stable, predictable performance across real-world use cases, encompassing latency, hallucination rate, robustness, and edge case handling.

What activation lift can product teams expect from an AI Agent?

Customers typically see 18 to 20 percentage point improvements in activation for complex multi-step processes. At Qonto, account aggregation activation doubled from 8% to 16%, and Aircall saw a 20% increase in self-serve account activation.

How does Sierra differ from an in-product AI Agent like Tandem?

Sierra focuses on multi-channel customer experience (chat, SMS, email, voice) with strong inquiry resolution for support use cases. AI Agents like Tandem focus on in-product workflow execution to drive feature activation and adoption, seeing what users see and executing actions directly inside the product UI.

What does a custom AI build actually cost?

A two-person team (one ML engineer plus one frontend specialist) runs $367,000 to $476,000 annually at fully loaded rates, based on levels.fyi compensation data and a 1.3x overhead multiplier. Like all in-app guidance platforms, ongoing content work is required regardless of build or buy. The distinction is whether technical infrastructure work also falls on your team.

Key terms glossary

Activation rate: The percentage of new users who reach a defined "aha moment" or complete core setup within a specified time window. Industry average for B2B SaaS sits at 36 to 38%, with many high-performing products investing in contextual in-product guidance.

Time-to-first-value (TTV): The elapsed time from a user's first login to their first meaningful outcome in the product. Lower TTV correlates directly with higher trial-to-paid conversion and reduced logo churn in the first 90 days.

Total cost of ownership (TCO): The fully loaded cost of a technology decision over a defined period, including initial development, personnel, infrastructure, and ongoing work, used to compare build vs. buy options on equivalent financial terms.

AI Agent: An AI system that perceives context, reasons about user goals, and takes actions in the environment, as opposed to a chatbot that generates text responses only. Production-grade AI Agents require action sequencing, context preservation, and failure recovery beyond what a conversational model alone provides.

Subscribe to get daily insights and company news straight to your inbox.

Keep reading

May 6, 2026

min

Board-ready KPI scorecard: In-app guidance metrics for executive reporting

Board-ready KPI scorecard for in-app guidance with 7 metrics that translate activation into financial outcomes executives understand.

Christophe Barre

May 6, 2026

min

In-app guidance ROI: Measuring what actually matters (not tour completion %)

In-app guidance ROI requires activation rate and CAC payback metrics, not tour completion rates. Learn the CFO-ready framework.

Christophe Barre

May 6, 2026

min

Support ticket deflection economics: How AI Agent reduces CS costs

Support ticket deflection with in-app guidance cuts CS costs 40 to 70% on guided workflows while rescuing activation revenue.

Christophe Barre

May 6, 2026

min

Time-to-value reduction: Why it matters more than onboarding speed

Time to value metric predicts trial conversion better than onboarding speed. Learn how to measure TTV and reduce it to 2-3 days.

Christophe Barre