Use-cases
Features
Internal tools
Product
Resources
Adding AI Agent capabilities to your existing copilot: Screen awareness and action execution as a library
Real-time user friction detection & AI-powered intervention: The complete guide to proactive support
Evolving User Jobs During Trial: How to Detect and Adapt Onboarding as Jobs Change
Best alternatives to Sierra AI for enterprise conversational AI (2026)
Sierra AI for SaaS: When Conversational AI Justifies the Engineering Investment
BLOG
/
Adding AI Agent capabilities to your existing copilot: Screen awareness and action execution as a library
Adding AI Agent capabilities to your existing copilot: Screen awareness and action execution as a library
Christophe Barre
co-founder of Tandem
Share on
On this page
Add screen awareness and action execution to your existing AI copilot using a capability library with no backend changes required.
Updated April 13, 2026
TL;DR: You don't need to rebuild your existing AI copilot to add screen awareness and action execution. A capability library injects these features at the UI layer using a JavaScript snippet with no backend changes. Technical setup is straightforward, and product teams configure workflows in days through a no-code interface. Aircall achieved a 20% activation lift this way. Qonto doubled feature adoption for multi-step workflows (8% to 16%) without rebuilding. Enhance, don't replace.
Your existing AI copilot can answer questions, and it probably does that reasonably well, but it's blind to the user's screen, can't execute a single step in a workflow, and has no idea whether the person typing is on the billing page or mid-way through a configuration flow. For product and CX leaders managing activation challenges, this gap directly impacts your metrics because only 36–38% of SaaS users successfully activate, leaving 60%+ of your signups stuck before they reach core product value. The instinct many product teams act on is a full rebuild, but that's almost always the wrong call.
Uncovering copilot feature adoption gaps
When your copilot can't see the screen or take action, you've built a search bar with a chat interface. Users vibe-app their way through your product expecting contextual help, not document retrieval, and when the AI fails to meet that expectation, feature adoption stalls. Advanced features often struggle with adoption despite months of engineering investment, and onboarding metrics tell the story plainly: users are bouncing at the hard steps.
Why copilots fail at action execution
Most copilots built in the last two years operate the same way: a user types a question, the LLM retrieves from a knowledge base or help doc, and text appears. That covers the "explain" use case and almost nothing else.
When a user needs to connect a Salesforce integration, configure team permissions, or complete a multi-field compliance form, a text response saying "click Settings, then Integrations, then Salesforce" fails if the user is already on the wrong page staring at an unfamiliar interface. Only 5% complete multi-step product tours, and a copilot that can only narrate instructions performs no better. Action execution, where the AI actually fills the form, triggers the configuration, or navigates the workflow, is what converts intent into activation.
DOM-based screen awareness changes this fundamentally. Instead of guessing at UI state from documentation, the AI reads the live document object model, giving it precise, real-time information about every element on the page, including what's enabled, what's filled, and what's next. The difference between "there appears to be a button" and "there is an enabled Submit button with ID submit-form" is the difference between generic guidance and contextual help.
Escalating cost of AI copilot upkeep
In-house builds demand significant engineering effort to keep prompts and context aligned with your product as it evolves, requiring continuous work to account for navigation changes, renamed fields, and new features.
The cost of building AI agents runs $200,000 to $500,000 or more for initial development, with substantial ongoing maintenance costs. For screen-aware, action-executing copilots specifically, AI agent development research indicates builds can take several months for moderate complexity and six months or longer for full multi-agent systems. That's before a single user benefits.
Existing copilot: Enhance or replace?
The decision comes down to one question: is your copilot's architecture fundamentally incompatible with integration, or does it just lack specific capabilities? In most cases the answer is the latter, which means enhancement is the right path and a rebuild is an avoidable cost.
Library integration for AI capabilities
A capability library differs from a general-purpose API. It's a specific UI-layer component that plugs into your application's front end, reads screen state in real time, and executes defined actions on behalf of the user, all without touching your existing backend. You keep your chat interface, your LLM routing logic, and your existing user session handling while the library adds screen awareness and action execution on top.
The Tandem AI agent works exactly this way: one JavaScript snippet, no backend changes, no SDK sprawl. Product teams then configure the workflows the agent handles through a no-code interface, specifying when to explain, when to guide step-by-step, and when to execute directly. Your existing copilot's conversational logic remains intact and the library handles what it couldn't do before.
Enhancement preserves investment and accelerates value
Qonto's product team considered building their own solution from scratch before evaluating the alternatives. The analysis was straightforward: Qonto's implementation with Tandem deployed in days, versus a 6+ month in-house build timeline. The result was measurable activation improvements, with account aggregation activation doubling from 8% to 16%. Building from scratch would have consumed the same engineering budget with no guarantee of reaching those numbers.
If your copilot already handles user queries and routes requests reasonably well, you're starting from a better position than you think. Screen awareness and action execution slot in at the UI layer without requiring you to discard prior work.
How AI explains, guides, and executes: The three modes
The shift from "answers questions" to "completes workflows" isn't a UX enhancement; it's the difference between a copilot users tolerate and one that actually drives activation. Understanding all three modes, explaining, guiding, and executing, is what allows product teams to deploy AI help that's appropriate to what each user needs at each moment.
Screen awareness: Seeing what users see
Screen awareness allows the AI to understand what users see on their screen. The system can perceive the current page context, which means it knows the user is on the Salesforce integration page, has already filled in the API key field, and is stuck on the permissions dropdown.
This approach aims to improve accuracy by having the AI read actual rendered state rather than generating from static documentation. Tandem reads what's visible on screen, focusing on the rendered interface to help ensure contextual relevance. Tandem is SOC 2 Type II certified with GDPR compliance and AES-256 encryption.
AI performing tasks in your product
Action execution leverages browser interactions: the library can interact with UI elements like form fields and buttons as if the user had done it themselves. Common actions like filling forms or clicking buttons happen safely and predictably.
Product teams control the scope of execution entirely. You define which workflows the agent can execute, which actions require confirmation, and which areas are off-limits. Playbooks define these constraints through a no-code interface, and the AI operates only within the boundaries you set. Users expect to vibe-use software that understands what they're doing and helps accordingly, whether that means answering a question, walking through steps, or completing a workflow step on their behalf.
When a UI element changes, the library's self-healing architecture detects and adapts automatically to maintain functionality.
Full self-serve workflows in action
At Aircall, adding this layer of contextual assistance produced a 20% activation increase for self-serve accounts. Advanced features that previously required human explanation became self-serve because the AI could explain requirements when users needed clarity, walk through setup steps when they needed direction, and execute configuration when they needed speed. The same workflow, three different help modes, each deployed based on what the user actually needed in that moment.
Integrating capability libraries: The setup
The technical barrier to integration is lower than most product teams expect, and understanding what happens at each layer helps set realistic expectations with engineering and leadership.
Implementation scope: UI vs. backend
The library typically integrates primarily through a front-end JavaScript snippet, similar to the integration pattern used by analytics tools or chat widgets. In many implementations, this minimizes backend changes and reduces the need for new API endpoints or modifications to your authentication system or data models. The security review is scoped to the client-side library: what data it reads, how it's transmitted, and what certifications cover it. A front-end focused deployment generally takes under an hour, while more extensive backend integrations require lengthier engineering review.
Fitting AI into your copilot architecture
The library operates as a UI layer between your application's rendered interface and your existing copilot's chat component. Your LLM calls, session routing, and user authentication remain exactly where they are. The capability library adds a screen-context feed to whatever query the user submits and adds an action execution interface for workflows you define.
For product teams evaluating the Tandem vs. Vercel AI SDK decision, the architectural distinction is that a capability library handles screen integration and workflow execution out of the box, while a DIY SDK approach requires you to build those layers yourself.
Selecting the right AI integration library
Not all capability libraries are equivalent. Evaluating vendors on the right criteria keeps you from trading one underperforming copilot for another.
Does it work with your existing copilot?
The integration path should be additive, not destructive. A library that requires you to replace your chat UI, re-route your LLM calls, or migrate your existing session data isn't a library; it's a replacement platform in disguise. Ask specifically whether the capability layer can sit alongside your current copilot components or must take full ownership of the user-facing interaction. Those are the only two options, and the answer tells you immediately which category you're dealing with. Tandem is designed to integrate with existing applications without requiring you to abandon prior investment. Test the library against edge cases and error states, not just the configured happy path, because a copilot that breaks on unusual form layouts undermines trust in the feature.
Cost of running AI features
The sticker price of a capability library competes against two alternatives: the engineering cost of building in-house and the revenue cost of not activating users. Both are larger than most teams initially calculate.
Approach | Upfront Time | Annual Maintenance | Primary Owner |
|---|---|---|---|
Build in-house (screen awareness) | Months of development | Ongoing engineering resources | Varies by team |
Traditional DAP (Pendo, WalkMe) | Implementation period | Content management | Typically Product + Engineering |
Capability library (Tandem) | Minimal setup | Ongoing refinement | Product team |
On the revenue side: as an illustration, consider 10,000 signups at a 35% activation baseline and $800 ACV, which would generate $2.8M ARR from activated users. Lifting that baseline to 42% would add $560,000 in new ARR from the same acquisition volume, with no increase in sales or CS headcount. That's the type of number to run against vendor cost.
What breaks when you update your UI
All in-app guidance platforms require ongoing content management. When you ship a new feature or redesign a workflow, someone on the product team updates the playbooks and messaging for that area, and this is true regardless of which platform you use.
What differs is whether UI changes also require engineering intervention to fix broken selectors or update DOM references. Tandem adapts automatically to minor changes in most cases and notifies you about major structural changes rather than failing silently. For a direct comparison of how traditional DAPs handle UI changes, the contrast in maintenance allocation is significant: product teams manage content, while engineering stays focused on core development.
Minimizing AI errors and hallucinations
Playbook constraints and real-time DOM grounding work together to reduce hallucinations architecturally, not just through prompting. The AI reads the actual rendered UI state, so it cannot generate guidance about elements that don't exist in the current context. Combining this grounding with playbook-based guardrails produces the reliable in-product behavior users need to trust the assistant, because the AI operates only within the workflows you've defined and the screen state it can read.
Build vs. buy: TCO for new AI capabilities
Build vs. buy: Development and maintenance costs
Building DOM parsing and context understanding from scratch requires a significant custom engineering effort to make live UI state usable by the LLM layer. According to AI agent development research, mid-complexity builds take three to five months and full multi-agent systems with screen interaction take six to twelve months. A development effort of this scale requires substantial engineering resources, with annual maintenance typically consuming 20 to 30% of initial development costs according to AI agent cost analysis. A vendor solution where product teams manage content via a no-code interface redirects that ongoing allocation back to core product development.
Calculate activation lift ROI, not maintenance savings
The ROI calculation that matters isn't maintenance hours saved but activation lift converted to revenue. When Product and CX teams own content updates through a no-code interface, without routing changes through engineering, maintenance overhead drops, but that's a secondary benefit. The primary number is activation lift. At Qonto, 100,000+ users activated paid features through AI-guided workflows, with account aggregation activation doubling from 8% to 16%. For a product with 10,000 signups, 35% baseline activation, and $800 ACV: lifting activation to 42% adds $560,000 in new ARR from the same acquisition volume. That's the number to bring to your leadership team, not a maintenance hours comparison.
Implementation path: Adding capabilities without rebuilding
Once you've confirmed enhancement makes business sense, the implementation follows a clear sequence that product teams can drive without engineering involvement after the initial snippet deployment.
Add AI capabilities as a library
The mechanics are straightforward:
Add the JavaScript snippet to your application's head or body, registering the capability library with your existing front end with no effect on your backend services.
Configure the agent appearance to match your existing copilot's visual style, positioning it as an enhancement to what users already know rather than a new tool to learn.
Point the agent to your help docs through the configuration interface so the AI has access to your existing content alongside the new screen context.
Steps for copilot integration
The configuration work follows a clear sequence:
Define your workflow targets: Identify the highest-friction activation flows where users currently drop off or generate support tickets.
Build playbooks for each workflow: Specify when the AI should explain (user is exploring a new feature), guide (user is mid-setup and needs step-by-step direction), or execute (user needs to complete a repetitive multi-field task).
Set action permissions: Define exactly which UI interactions the agent can execute and which require explicit user confirmation.
Configure proactive triggers: Set conditions under which the assistant surfaces help automatically when user behavior indicates they need assistance.
For teams evaluating the execution-first AI comparison against guidance-only tools, the key difference is whether the tool guides users toward UI elements or actually completes workflow steps.
Go-live strategy for AI copilot
Measure from day one. The metrics that tell you the enhancement is working are activation rate for the specific workflows you've targeted (compared to pre-integration baseline), time-to-first-value for new signups, and support ticket volume for the workflows the AI now handles.
Aircall's 20% activation lift was measurable within the first weeks of deployment. Qonto's feature adoption doubled for account aggregation by targeting the highest-friction workflows first. Start narrow, measure fast, and expand to additional workflows based on what the dashboard shows users are still struggling with. The user activation strategies guide covers how to prioritize which workflows to target first by SaaS category.
Content work is constant; technical work shouldn't be
Your product will keep shipping. UI changes are not an exception. They're the operating condition. Product teams will always update messaging, refine targeting logic, and add new playbooks when features launch, and that's true regardless of which platform you use.
What changes with a properly architected capability library is the split between content work (product team, ongoing) and technical fixes (engineering, rare). Your product team owns the quality of the AI experience. Your engineering team doesn't own a standing maintenance allocation. That split is what makes meaningful activation improvement achievable at the pace most product leaders actually need.
Your activation rate is either growing or it isn't. A copilot that can only answer questions and can't see the screen won't move that number. The path forward isn't a rebuild; it's the right library deployed against the right workflows, measured from day one.
If your current activation rate sits below 40% and users drop out during complex setup flows, see how Tandem integrates with your existing copilot in a 20-minute demo and calculate what an activation lift could mean for your ARR at current acquisition volume.
FAQs
Can I add Tandem's AI Agent to my existing copilot without replacing the chat interface?
Yes. A capability library integrates at the UI layer via a JavaScript snippet, so your existing chat interface, LLM routing, and session handling remain intact. You're adding screen context and action execution to what your copilot already does, not replacing it.
What happens to the AI experience when I ship a UI update?
The library adapts automatically when minor CSS or DOM changes occur. For major structural changes, the AI experience reverts to your native UI and your team receives a notification rather than users encountering a broken experience.
How long does integration actually take?
Technical setup (snippet installation and initial configuration) takes under an hour. Product teams typically deploy the first AI-guided workflows within days using the no-code playbook interface.
Does screen awareness mean the AI stores what's on the user's screen?
No. Agents work in real time on the client side without storing user data, and you can configure the library to exclude specific sensitive fields such as SSNs or payment card numbers.
When does a full copilot rebuild actually make sense?
A rebuild might make sense in rare cases where your existing architecture has fundamental constraints that can't be addressed through integration, but most teams find enhancement faster and less expensive. The typical path is to deploy a library first, measure activation improvement, and only consider a rebuild if specific technical limitations emerge that block your goals.
How does a playbook-based library reduce AI hallucinations?
The AI reads the actual rendered DOM state, so it cannot generate guidance about UI elements that don't exist in the current context. Playbook constraints further limit the AI to pre-defined workflows and approved actions, and the combination of real-time grounding and scope constraints addresses the primary failure modes that cause hallucinations in general-purpose chatbots.
What does screen awareness cost to build in-house?
Building a screen-aware, action-executing AI agent in-house requires substantial engineering resources and time. Development efforts typically span several months, with mid-complexity builds taking three to five months and full multi-agent systems with action execution taking six to twelve months or longer.
Key terms glossary
Activation rate: The percentage of new signups who reach a defined first-value moment ("aha moment") within a set time period. The industry average for B2B SaaS is 37.5%.
Capability library: A modular, UI-layer component that adds specific AI capabilities (screen awareness, action execution) to an existing application without requiring backend changes or a full platform replacement.
DOM (Document Object Model): The structured representation of a web page as rendered in the browser, including all visible elements, their states, and their properties. A DOM-based AI reads this live structure to understand what the user currently sees.
Action execution: An AI capability that completes UI interactions (form fills, button clicks, navigation steps) on the user's behalf using native browser interactions rather than backend API calls.
Time-to-first-value (TTV): The time elapsed between a user's first login and their first meaningful product outcome. Shorter TTV correlates directly with higher activation and retention rates.
Playbook: A product-team-configured set of instructions that defines when and how an AI agent should explain, guide, or execute within specific application workflows. Playbooks constrain AI behavior to pre-defined, approved actions.
Product-led growth (PLG): A go-to-market strategy where the product itself drives user acquisition, activation, and expansion without requiring sales-assisted motion for every conversion.
Subscribe to get daily insights and company news straight to your inbox.
Keep reading
Apr 13, 2026
10
min
Real-time user friction detection & AI-powered intervention: The complete guide to proactive support
Real-time user friction detection paired with AI-powered intervention prevents drop-off before users abandon your product.
Christophe Barre
Apr 13, 2026
10
min
Evolving User Jobs During Trial: How to Detect and Adapt Onboarding as Jobs Change
User jobs shift during trial from evaluation to implementation. Detect intent changes and adapt onboarding to lift activation rates.
Christophe Barre
Apr 7, 2026
10
min
Best alternatives to Sierra AI for enterprise conversational AI (2026)
Best InKeep alternatives ranked by ticket type for SaaS support teams seeking higher deflection on setup and integration tickets.
Christophe Barre
Apr 7, 2026
10
min
Sierra AI for SaaS: When Conversational AI Justifies the Engineering Investment
Sierra AI alternatives for SaaS activation: ROI framework, deployment costs, and when conversational AI justifies the investment.
Christophe Barre