Who is it for
Industries
Internal tools
Product
Resources
Reliability & failure modes: Sierra vs. competitors in production
CommandBar implementation: Time, cost & engineering hours required
Why companies leave CommandBar: Real switching reasons & patterns
Close Your 85% PLG Conversion Gap: The PQL Playbook for Sales & Product
Building Custom Conversational AI vs. Sierra: Engineering Hours & Maintenance Reality
BLOG
Building Custom Conversational AI vs. Sierra: Engineering Hours & Maintenance Reality
Christophe Barre
co-founder of Tandem
Share on
On this page
Building custom conversational AI takes 12 to 18 months and costs $367K to $476K annually while activation problems persist unresolved.
Updated April 24, 2026
TL;DR: Building custom AI to close that gap consumes 12 to 18 months while the activation problem stays open. Sierra offers strong multi-channel customer experience AI across chat, SMS, and voice. Tandem is purpose-built as an AI Agent that sees what users see inside your product and executes workflows directly, deploying via a JavaScript snippet in under an hour. For complex multi-step workflows like account aggregation at Qonto, feature activation doubled from 8% to 16%, representing 100,000 activated users without additional engineering time. Self-serve account activation rose 20% at Aircall, while Sellsy saw an 18% activation lift.
When product teams hit a 15% workflow completion rate, the instinct is to build custom AI that can guide users through the complexity.
What most teams discover six to twelve months later is that the AI works in controlled demos, but activation rates haven't moved, because building reliable in-product execution is far more complex than any initial architecture review reveals.
This guide breaks down why custom conversational AI consumes engineering resources without reliably solving activation, what Sierra and Tandem each deliver for different use cases, and how to make a build vs. buy decision your board can defend with real numbers.
Evaluating build vs. buy for conversational AI
Conversational AI covers a wide range of problems: multi-channel customer support, internal workflow assistance, in-product onboarding, and virtual agents. Before committing to a build, it helps to separate the parts of this stack that differentiate your product from the parts that are commodity infrastructure any vendor has already solved.
Why activation stays broken during custom AI builds
Most product teams start building custom AI with a clear goal: to help users complete complex workflows that currently have a 15% completion rate. Industry data shows only 5% of users complete multi-step walkthroughs, and 36 to 38% of SaaS users activate successfully, leaving the majority of trial users churning before they reach first value.
The challenge isn't the AI model. It's the execution layer: safe action sequencing, context preservation across multi-step workflows, and graceful recovery when actions fail or users deviate from expected paths. These are separate engineering projects, and teams consistently underestimate how long they take.
Enterprise AI projects typically take 12 to 18 months from initiation to production deployment, a timeline reflected across comprehensive rollouts that include assessment, pilot development, and scaling phases. That's not a worst-case estimate for under-resourced teams. It reflects experienced engineering organizations building AI infrastructure with real users and real edge cases.
What activation lift actually requires
Even a modest improvement in activation translates directly into new ARR without additional acquisition spend, and that's the business case for in-product AI. But achieving it requires AI that understands user context, knows where they are in a workflow, and can explain a concept, guide the next step, or execute a repetitive action depending on what the user actually needs.
A chatbot that reads help docs and generates text responses doesn't deliver this. Neither does a custom build, still in month nine of a six-month timeline. The onboarding metrics guide quantifies what each percentage-point improvement in activation means for recovered ARR, making the opportunity cost of delayed AI investment concrete.
Sierra vs. embedded agents: different use cases
Sierra is a conversational AI platform built for enterprise customer experience teams, designed to handle inbound customer inquiries across channels without human intervention. Its agents connect with CRM, order management, and knowledge base systems to handle inquiries across chat, SMS, WhatsApp, email, and voice. Sierra's agents are designed to resolve customer inquiries without human intervention, which is valuable for multi-channel support use cases.
Tandem solves a different problem. It lives inside your product, sees the user's actual screen state, and can explain, guide, or execute based on what the user is trying to accomplish in the moment. For example, when a user needs to understand how a feature works, Tandem explains the concept.
When they need to complete a multi-step workflow, such as integration configuration, Tandem can guide them through each step or handle the repetitive parts. Sierra's multi-channel support architecture isn't built for these in-product moments. Tandem is.
These are not competing solutions for the same buyer. Sierra serves customer experience teams managing inbound volume across channels. Tandem serves product and growth teams trying to lift trial activation and feature adoption inside the product itself.
Initial build phase: engineering hours deep dive
Understanding the true cost of a custom AI build starts with mapping the timeline and team requirements in detail. Here's a realistic engineering breakdown for organizations evaluating whether to build in-house or adopt a purpose-built platform.
Engineering hours: initial AI build
Enterprise AI implementations typically require 12 to 18 months for comprehensive rollouts. This timeline includes four to six weeks for assessment and scoping, three to four months for pilot development, and six to eight months for scaling to production. The consistency of this pattern across organizations makes it a planning baseline rather than an edge-case risk.
Most teams targeting a functional V1 follow an eight-month development arc that looks like this:
Weeks 1-4: Scoping, model selection, architecture decisions, and environment setup.
Months 2-3: Core conversational logic, RAG pipeline, and initial prompt architecture.
Month 4: UI integration layer and action sequencing scaffolding.
Months 5-6: Internal testing, edge case handling, and monitoring setup.
Months 7-8: Closed beta with real users, reliability fixes, and escalation path wiring.
By month eight, the typical outcome is a system that successfully handles the majority of user sessions but still requires at least one dedicated sprint per month to address newly discovered failure modes.
The team you actually need
Building a viable custom AI solution requires three distinct engineering specializations, and these skill sets rarely overlap in a single hire:
ML/AI engineer: Model selection, prompt architecture, RAG pipeline design, evaluation framework, and response quality monitoring.
Frontend specialist: UI integration layer, action sequencing for form fills and multi-step navigation, and browser compatibility.
Backend/orchestration engineer: Tool-calling frameworks, API integrations, state management across multi-turn conversations, and error-handling logic.
Most teams begin by distributing this work across two senior engineers, only to discover mid-project that the UI interaction layer alone demands a dedicated resource. This realization is often the first point where timelines begin to slip. When you're building in-app AI agents, this challenge runs especially deep for product teams weighing their options.
The activation opportunity cost
Each engineer-month invested in building AI infrastructure is an engineer-month diverted from shipping the product features that directly drive activation and revenue. During the months your team spends constructing the execution engine, the activation gap remains unaddressed. The product adoption guide for technical builders explores why this timeline consistently catches even experienced engineering leaders off guard, despite their track record shipping complex systems.
Production reliability challenges
Once your custom AI reaches production, two distinct classes of reliability problems emerge. The first is the gap between controlled demos and real-world user behavior; the second is the ongoing disruption caused by upstream model provider changes.
The demo-to-production gap
The demo-to-production gap is a measured phenomenon: near-perfect performance in controlled testing environments drops significantly once real users introduce unexpected inputs and edge cases. The onboarding mistakes guide maps the most common failure patterns that emerge during this transition.
For activation-focused use cases, this reliability gap directly impacts trial conversion, time-to-first-value, and feature adoption rates. The last 10% of reliability, handling the long tail of edge cases and unhandled state changes, requires a disproportionate share of the engineering investment to close.
Model updates and production stability
Model providers ship API updates on unpredictable schedules. LLM applications routinely experience service disruptions during major model updates, requiring prompt audit cycles, regression testing, and staging validation before changes reach production. When a provider deprecates a model or changes its tool-calling schema, every custom orchestration layer built against that API must be rewritten before the deprecation deadline.
Tandem addresses production stability through an architecture that automatically adapts to changes, so product teams can focus on content quality rather than technical fixes. The Tandem vs. CommandBar comparison explains how this works in practice.
AI project economics: the real numbers
Understanding the full financial picture requires looking beyond initial engineering costs to the ongoing work, retention risks, and opportunity costs that accumulate over time. The real economics of custom AI builds compared to vendor platforms like Sierra and Tandem give your team the complete cost visibility needed to make a defensible decision.
Ongoing work: what all platforms require
All in-app guidance platforms, including Sierra, Tandem, and every other vendor solution, require continuous content work: writing playbooks, updating targeting rules, and refining experiences as your product evolves, regardless of whether you build or buy. The distinction from a custom build is that technical work also falls entirely on your engineering team, rather than being handled by the vendor. The 90-day CX transformation guide outlines how product teams reclaim engineering capacity by shifting to purpose-built platforms.
Engineer retention risk
Top engineers take roles at product-stage companies to build differentiated capabilities, not to maintain infrastructure on a rotating schedule. The "glue work" problem in AI maintenance is well-documented in engineering practice: engineers are pulled into ongoing reliability work that organizations often don't recognize as promotable, creating a retention risk that compounds over time.
Total ownership cost: Sierra vs. build
The table below compares the three paths across the dimensions that matter most to engineering and finance leaders evaluating this decision. Each row reflects realistic planning figures, not best-case projections.
Dimension | Custom build | Sierra | Tandem |
|---|---|---|---|
Initial cost | ~$300K (2 engineers x 6 months) | Enterprise contract | JavaScript snippet, under 1 hour |
Time to production | 6+ months minimum | Professional services engagement | Days |
Ongoing technical work | Engineering-owned | Vendor managed | Vendor managed |
Execution capability | Tailored to scope | Multi-channel CX actions | In-product UI execution |
Primary use case | Whatever you scope | Customer experience, multi-channel | In-product activation and adoption |
Activation impact | Delayed 6+ months | Support resolution focus | 18-20% activation lift (Aircall, Sellsy) |
Sierra's implementation involves professional services engagement, system integration scoping, and multi-channel deployment configuration across CRM and order management systems. This typically takes several months for enterprise implementations. Tandem requires a JavaScript snippet added to your application in under an hour. After that, product teams own all configuration through a no-code interface, with no engineering involvement required. At Aircall, the team was live within days.
Why custom conversational AI projects fail
Most custom AI projects don't fail because of the model , they fail because of scope expansion and delayed switching decisions. Two patterns account for the majority of stalled builds.
Managing custom AI scope bloat
"Just a simple chatbot" is the most dangerous phrase in an AI project kickoff. Simple chatbots answer questions. What product teams actually need is an execution engine that navigates multi-step workflows, handles authentication flows, fills multi-field forms, and recovers gracefully from user corrections mid-workflow. Each capability is a separate engineering project, and the scope expands naturally as stakeholders see early demos and request the next feature.
The experiences page shows the full range of in-product responses, from guided explanations for users who need context to understand a feature, to direct execution for users who need a workflow completed, compared to what most internal builds ship in month eight.
When to evaluate a vendor switch
The right time to evaluate a vendor switch from an in-progress custom build is before the team reaches month six with an unstable V1. The investment already spent in the first six months is real, but it is smaller than the ongoing cost of two to four engineers allocated to technical work indefinitely. A useful signal to watch is the ratio of feature sprints to infrastructure sprints: teams that find infrastructure work consuming significant capacity may see the compounding cost of continuing to build internally exceed the vendor cost within 12 months for most team sizes.
When custom AI drains development capacity, the activation gap that motivated the project in the first place stays open. Advanced feature adoption sits at 10 to 15% despite months of development investment, and this number does not improve when the engineering team building those features is splitting attention between new development and AI infrastructure work.
Making the build vs. buy decision
The right framework for this decision starts with your use case, your data requirements, and your team's capacity, not with the AI itself. Three factors determine the right answer: the conditions that justify a custom build, the compliance baseline for any vendor you evaluate, and the activation opportunity cost specific to your product.
Conditions for a custom AI build
A custom build makes sense in specific conditions: your AI requires proprietary model fine-tuning on confidential datasets that cannot leave your infrastructure. For example, a healthcare platform training on de-identified patient records, or a financial services product building risk models on transaction data governed by internal data residency policies.
Your use case is tightly coupled to internal systems with no vendor integration depth comparable to what you need, or your team has existing AI infrastructure that a vendor solution cannot extend without replacing it wholesale. Outside these conditions, you are rebuilding commodity infrastructure at a cost that compounds each year.
Security and compliance baseline
Whether you are evaluating Sierra, Tandem, or any other AI vendor handling your product data, look for SOC 2 Type II certification, GDPR compliance documentation, and AES-256 encryption as the standard for B2B SaaS environments. Non-compliance with GDPR creates contractual liability and procurement delays that slow vendor onboarding, making compliance validation a standard requirement before any AI vendor handling EU personal data reaches production.
Tandem is SOC 2 certified and GDPR compliant, which removes this build from your internal engineering scope entirely. The digital adoption platform guide outlines what to evaluate when assessing vendor security posture.
Calculate your activation opportunity
Before your next planning cycle, measure two numbers: your trial-to-paid conversion rate and the percentage of users who complete core workflows within seven days. If conversion is below 20% and workflow completion is below 40%, the activation gap is costing you more in lost revenue than a purpose-built solution would cost in subscription fees.
When Qonto deployed Tandem, the results went beyond percentage gains; 100,000 users discovered and activated paid features, with account aggregation activation doubling from 8% to 16%. The same pattern held across different product categories: self-serve account activation rose 20% at Aircall, and Sellsy lifted activation by 18 percentage points. Measure those outcomes against your own ACV and monthly signup volume to see what the ROI case looks like for your specific metrics.
If your custom build is absorbing sprint capacity without moving activation, schedule a demo with Tandem to see how quickly a contextual AI Agent deploys in your product and what activation lift looks like in your specific use case.
FAQs
How long does Tandem take to deploy compared to a custom AI build?
Tandem's JavaScript snippet installs in under an hour with no backend changes required, and product teams configure playbooks through a no-code interface within days. A custom AI build takes 12 to 18 months to reach stable, predictable performance across real-world use cases, encompassing latency, hallucination rate, robustness, and edge case handling.
What activation lift can product teams expect from an AI Agent?
Customers typically see 18 to 20 percentage point improvements in activation for complex multi-step processes. At Qonto, account aggregation activation doubled from 8% to 16%, and Aircall saw a 20% increase in self-serve account activation.
How does Sierra differ from an in-product AI Agent like Tandem?
Sierra focuses on multi-channel customer experience (chat, SMS, email, voice) with strong inquiry resolution for support use cases. AI Agents like Tandem focus on in-product workflow execution to drive feature activation and adoption, seeing what users see and executing actions directly inside the product UI.
What does a custom AI build actually cost?
A two-person team (one ML engineer plus one frontend specialist) runs $367,000 to $476,000 annually at fully loaded rates, based on levels.fyi compensation data and a 1.3x overhead multiplier. Like all in-app guidance platforms, ongoing content work is required regardless of build or buy. The distinction is whether technical infrastructure work also falls on your team.
Key terms glossary
Activation rate: The percentage of new users who reach a defined "aha moment" or complete core setup within a specified time window. Industry average for B2B SaaS sits at 36 to 38%, with many high-performing products investing in contextual in-product guidance.
Time-to-first-value (TTV): The elapsed time from a user's first login to their first meaningful outcome in the product. Lower TTV correlates directly with higher trial-to-paid conversion and reduced logo churn in the first 90 days.
Total cost of ownership (TCO): The fully loaded cost of a technology decision over a defined period, including initial development, personnel, infrastructure, and ongoing work, used to compare build vs. buy options on equivalent financial terms.
AI Agent: An AI system that perceives context, reasons about user goals, and takes actions in the environment, as opposed to a chatbot that generates text responses only. Production-grade AI Agents require action sequencing, context preservation, and failure recovery beyond what a conversational model alone provides.
Subscribe to get daily insights and company news straight to your inbox.
Keep reading
Apr 24, 2026
10
min
Reliability & failure modes: Sierra vs. competitors in production
Sierra vs competitors in production reliability: honest failure mode analysis, task completion rates, and MTTR data for CTOs.
Christophe Barre
Apr 24, 2026
15
min
CommandBar implementation: Time, cost & engineering hours required
CommandBar implementation takes 2-4 weeks of engineering time. Compare setup costs, maintenance hours, and faster alternatives.
Christophe Barre
Apr 24, 2026
15
min
Why companies leave CommandBar: Real switching reasons & patterns
CommandBar alternatives emerge when passive guidance fails complex workflows. Real churn patterns show 60% of users abandon multi-step setups.
Christophe Barre
Apr 24, 2026
11
min
Close Your 85% PLG Conversion Gap: The PQL Playbook for Sales & Product
Close your 85% PLG conversion gap by defining strict PQL thresholds and timing sales engagement to genuine activation signals.
Christophe Barre