AgenticUniverse - Previously Formi
  1. Our Research
AgenticUniverse - Previously Formi
  • Our Technical Note
    • Why Open AI is not Enough
    • How business Outcomes would Change Radically with AgenticUniverse
    • Our Research
      • STT - Nuances and Insights
      • Solving for STT Constraints
  • Generate Token
    • Login/Generate Token
      POST
  • Agent Configuration
    • Model Configuration
      • Configuration Helpers
        • Supported Providers
        • Supported Models
        • Supported Parameters
      • Get Model Configuration
      • Set Model Configuration
    • State Machine
      • Edge
        • Legacy
          • Create Edge
          • Edge Details
          • Update Edge
          • Delete Edge
        • Update edge properties
        • Get edge details
        • Delete an edge
        • Create an edge (transition) between two states
      • State
        • Create State from Template
        • Get State Information
        • Update State
        • Delete State
      • Get State Machine Structure
    • Prompt Templates
      • Get All Templates
      • Render Template
    • Tools
      • Get Tools List
      • Add Tool
      • Update Tool
      • Delete Tool
    • Get All Agents
      GET
    • Single Agent Details
      GET
    • Create Agent
      POST
    • Update Agent Details
      PUT
    • Enable Dashboard For An Outlet
      POST
    • Disable Dashboard For An Outlet
      POST
    • Get Call queue Sheet ID
      GET
  • Interactions
    • Pre-Interaction Context
      • Schedule an Interaction
      • Update an Interaction Id
      • Delete an Interaction Id
      • Clear all interactions
      • Get Summarized Interaction Info
    • Interaction Modalities
      • Video
        • Generation
          • Generate Welcome Video
      • Text
        • Start Interaction
        • Create Response
        • End Interaction
      • Voice
        • Connection Configuration
          • Quickstart
          • Connecting Twilio
          • Connecting Exotel
          • Formi WebSocket Configuration Guide
          • Create a New Connection Vendor
          • Get All Connection Vendors
          • Update a Connection Vendor
          • Delete a Connection Vendor
          • Get Agent's Connection Config
          • Add or Update Agent's Connection Config
    • Post Interaction Configuration
      • Email Destination Configuration
      • Variables CRUD
        • Get all required variables for the outlet with map
        • Modify variable definition for outlet
        • Add a new variable for the outlet
        • DELETE variable for outlet
        • Connect Variable to a destination
      • Destinations CRUD
        • Get all destinations for the outlet
        • Modify Destination for outlet
        • Add a new Destination for the outlet
        • DELETE Destinations for outlet
    • Get Interaction Summary
      GET
    • Resolve an Escalated Interaction
      POST
    • Get the Interaction list
      GET
    • Get Information regarding Single Interaction
      GET
  • Agent Utilisation
    • Get Credits Available
      GET
    • Interaction Utilisation
      GET
    • Model Utilisation
      GET
  • Webhooks
    • Get webhook URL
    • Update webhook URL
    • Get webhook metadata
    • Modify webhook metadata
    • Get reservation ingestion metadata
  • Untitled Endpoint
    POST
  1. Our Research

Solving for STT Constraints

Overview#

This document outlines a plan to enhance task completion by improving the system's ability to handle transcript errors and missing transcripts. It delves into the current challenges faced from both the End Customer (EC) and engineering perspectives, envisions the ideal scenario, identifies constraints, defines the scope and core of the problem, and proposes initial steps to address these issues.

Current Scenario#

EC Perspective#

Inappropriate Responses:
When the agent is completing a task T to help the customer fulfill their intent, the end-customer receives an inappropriate response from the agent for what the customer speaks, which leads to higher effort and drop-offs if triggered multiple times.
Examples:
When the agent asks about their gender preference. A customer mentions "co-ed properties" (Co-Ed Properties), but due to transcription errors, it's recognized as "Co-Vid policies" leading the agent to discuss COVID policies instead.
Impact:
Increase in customer effort as the customer has to clarify that they meant something else.
Silent Responses:
When the agent is completing a task T to help the customer fulfill their intent, the end-customer does not receive any response from the agent for what the customer speaks, which leads to higher effort and drop-offs.
Examples:
When the agent asks about gender preference, a customer mentions “Male” which is not transcribed from the STT and leads to no response from the agent.

Engineering Perspective#

STT Failures:
Audio input is processed by the Speech-to-Text (STT) system.
The STT system returns a transcript that doesn't fit the task being performed ( collecting name, email, property area, etc. ), the agent does not have the actions available to recognize the transcript as invalid considering the task, which leads to the agent choosing the wrong action to complete the task.
Inadequate Responses:
The agent provides inappropriate replies or remains silent because there's no mechanism to check transcript accuracy against the conversation context.
Absence of Observation Component:
No system in place to observe and assess whether the transcript makes sense before generating a response.

Ideal Scenario#

EC Perspective#

Accurate and Relevant Responses:
The agent always provides appropriate replies that resolve the customer’s intent, and helps the customer navigate the conversation with the minimum effort.
Error Handling:
If the end customer’s voice is not audible, the agent has the ability to request the customer to repeat by speaking louder, slowly, moving to a quieter environment, etc. depending on the scenario. ( to be decided with the solution hypothesis )
Smooth Conversation Flow:
Minimizes the need for customers to repeat themselves.
Increase in resolution by the agent.

Scope of the Problem Capability#

Wrong Transcripts#

Objective:#

Develop a solution to identify and correct instances where the transcribed text does not accurately reflect what was said during the conversation.

Missing Transcripts#

Objective:#

Develop a solution to observe and handle situations where parts of the conversation are not transcribed at all.

Constraints, Facts, Assumptions to Consider while Solving the Problem#

Wrong Transcripts

Constraints:#

LLM-Triggered Identification: LLM can only identify and trigger events once their STT provides the transcript.
High False Positives: LLM may generate a high number of false positives for identification and trigger, which can affect the overall conversation if the system does not accomodate for the same. This is because the trigger would be dependent on a prompt and the prompt would not have 100% performance accuracy when pushed live the first time.

Facts:#

Five-Stage Process: The process involves five stages—identification, trigger, verification, reasoning, and action.
Identification: Recognizing that the transcript received is not correct considering the task being performed, the conversation until now, the customer’s intent and the business’s context/goals.
Trigger: Initiating a notification to the reasoning engine to decide the next action
Reasoning: Deciding the next best action based on constraints and environmental factors.
Action: LLM executing the chosen action.

Assumptions:#

Acceptable Latency: Additional latency of (1-2 seconds) is acceptable when performing reasoning to choose the action for wrong transcripts and can be masked with agent utterances.
Focus on Loudness and Speed: The corrective actions will primarily involve influencing the speaker's loudness and speed along with guessing what the person might have told considering the task being performed, the intent and the context of the conversation.
Missing Transcripts

Facts:#

Retail's Inability for Audio Triggers: LLM cannot trigger actions based on audio inputs before a transcript is available. This inhibits us from assigning the identification step to any component on LLM as LLM would only trigger if there is a transcript.
Control Over Shared Audio: Formi has control over the audio being shared among all three parties (our reasoning engine, LLM, and Exotel).

Constraints:#

Internal Identification and Trigger: Since LLM cannot detect missing transcripts, identification and triggering must be handled internally from our observer.
Unaltered Customer Experience: Any influence on the audio should not affect what the customer hears; the customer's experience must remain unchanged.

Assumptions:#

Focus on Loudness and Speed: Similar to wrong transcripts, actions will focus on probing the customer to adjust loudness and speed as actions to be probed by the agent.

KPI’s to Track#

Internal Metrics
False Positive Rate
Description: The percentage of times correct transcripts were incorrectly flagged as wrong.
Calculation: (Number of False Positives / Total Number of Triggers) × 100%
Value: Indicates the accuracy of the identification process and helps in refining verification mechanisms.
Verification Success Rate
Description: The percentage of triggers that, after verification, were confirmed as true positives.
Calculation: (Number of True Positives after Verification / Total Number of Triggers) × 100%
Value: Assesses the effectiveness of the verification stage in filtering out false positives.
Average Latency Introduced
Description: The average time delay (in seconds) added to the conversation due to the correction process.
Value: Helps evaluate whether the latency stays within the acceptable range (1-2 seconds) and its impact on conversation flow.
External Metrics
Success Rate of Corrective Actions
Description: The percentage of corrective actions that successfully resolved the transcription errors.
Calculation: (Number of Successful Corrections / Total Corrective Actions Taken) × 100%
Value: Measures the effectiveness of the actions in improving transcription accuracy.

Solution Hypothesis#

Mermaid Chart - Create complex, visual diagrams with text. A smarter way of creating diagrams.-2025-08-08-195350.png
Modified at 2025-08-09 11:58:28
Previous
STT - Nuances and Insights
Next
Login/Generate Token
Built with