Formi WebSocket Configuration Guide

Overview

Formi enables intelligent AI-powered voice interactions over phone calls through real-time audio streaming. This guide is for any client or partner integrating third-party telephony services (e.g., Exotel, Twilio, Knowlarity, Convox, Tata Tele etc.) with Formi using WebSockets.

This documentation provides a comprehensive guideline on how to:

Set up the WebSocket connection

Validate the set of events that Formi's system expects

Configure audio formatting that Formi's system expects

Establish a direct WebSocket connection with Formi

Stream and receive audio in real-time

It is designed for teams implementing either:

Unidirectional streams (call monitoring or transcription)

Bidirectional streams (interactive AI voice agents)

WebSocket Connection URL

Connection Endpoint

wss://<formi’s-domain>/ws-adapter/{provider}/{call_type}/{agent_id}/{outlet_id}/{virtual_number}?caller_id={caller_id}

<formi’s-domain> : staging-api-2.formi.co.in or api-2.formi.co.in

Required Path Parameters

Parameter	Type	Description	Example
`provider`	string	Short name for telephony provider (lowercase)	`twilio`, `exotel`
`call_type`	string	Direction of the call, one of inbound or outbound	`inbound`, `outbound`
`agent_id`	integer	Unique identifier for the agent	`12345`
`outlet_id`	integer	Unique identifier for the outlet/client	`67890`
`virtual_number`	string	Unique identifier for the configured number	`+1234567890`

Optional Query Parameters

Parameter	Type	Description	Default
`caller_id`	string	Customer's phone number	Auto-generated UUID

Example Connection URLs

# Twilio inbound call
wss://staging-api-2.formi.co.in/ws-adapter/twilio/inbound/123/456/1234567890?caller_id=919701966915

# Exotel outbound call
wss://api-2.formi.co.in/ws-adapter/exotel/outbound/789/012/09876543211?caller_id=9701966915

Universal Telephony Adapter

Formi's Universal Telephony Adapter is designed to work with any telephony provider regardless of their specific event naming conventions, payload structures, or audio encoding formats. The system adapts to different provider implementations while maintaining consistent internal processing**

Key Features

Provider Agnostic: Works with any telephony provider's WebSocket implementation

Dynamic Event Mapping: Automatically maps provider-specific events to Formi's internal event types

Audio Format Flexibility: Supports multiple audio encoding formats and automatic conversion

Template-Based Configuration: Uses JSON templates for easy provider integration

Event Types and Requirements

Formi expects four mandatory event types from all telephony providers. These events can have different names and payload structures across providers, but the core functionality must be present.

1. Handshake Event (Connected and Stop Event)

Purpose: Establishes the WebSocket connection and confirms the communication channel is ready at the start of the call and sends acknowledgement in a similar manner at the end of the call.

Formi Internal Name: connected, stop

Requirements:

Must be the first event sent after WebSocket connection

Should contain connection metadata

Confirms bidirectional communication capability

Expected Information:

Connection status

Session identifier (if available)

Provider-specific metadata

2. Meta Event (Start Event)

Purpose: Initiates the call session and provides call context.

Formi Internal Name: start

Requirements:

Must be sent before audio streaming begins

Contains call setup information

Provides context for the AI system

Expected Information:

Call direction (inbound/outbound)

Caller information

Call timestamp

Any additional call metadata

3. Audio Event (Media Event)

Purpose: Streams real-time audio data between telephony provider and Formi.

Formi Internal Name: media

Requirements:

MANDATORY: Must support bidirectional audio streaming

Must maintain consistent audio format throughout the session

Should handle audio buffering appropriately

Must support real-time streaming with minimal latency

Audio Format Requirements:

Sample rate: 8kHz (preferred) or 16kHz

Encoding: PCM, μ-law, or A-law

Channels: Mono (1 channel)

Bit depth: 8-bit or 16-bit

4. Control Event (Mark and Clear Events)

Purpose: Handles call control operations and state management.

Formi Internal Name: mark, clear

Requirements:

Clear Event: MANDATORY - Used for user’s interruptions handling during the call with our AI agent

Mark Event: OPTIONAL - Used for marking specific points in audio stream

Must support session termination events

Control Operations:

Audio buffer management, clear event should clear all the audio that is yet to be played from the audio buffer.

Session state changes

Audio Event Specifications

Supported Audio Formats

Input Audio Formats (From Provider to Formi)

Binary PCM: Raw binary audio data

Base64 PCM: PCM audio encoded in Base64

μ-law Base64: μ-law encoded audio in Base64 format

Output Audio Formats (From Formi to Provider)

Base64 PCM: PCM audio encoded in Base64

μ-law Base64: μ-law encoded audio in Base64 format

Audio Configuration Parameters

{
"sample_rate": 8000,
"encoding": "PCM_16_LE",
"channels": 1,
"bit_depth": 16,
"chunk_size_ms": 20
}

Audio Processing Rules

Sample Rate Conversion: Automatic conversion between different sample rates

Channel Conversion: Support for mono/stereo conversion

Format Conversion: Automatic encoding/decoding between supported formats

Buffer Management: Proper handling of audio chunks and streaming

Provider Implementation Requirements

Mandatory Event Validation Checklist

Before integrating with Formi, telephony providers must validate that their system supports all four event types:

Handshake Event: Connection establishment and disconnected events are implemented

Meta Event: Call initiation event with metadata is implemented

Audio Event: Bidirectional audio streaming is implemented

Control Event: At least clear event is implemented

Mark Event: Optional - Stream marking capability

Event Mapping Configuration

Each provider needs to provide a configuration mapping their events to Formi's expected format:

{
"provider": "your_provider_name",
"events": {
"incoming": {
"message_patterns": {
"connected": {
"detection_pattern": {"event": "connection_established"},
"extraction_map": {"status": "connection.status"}
},
"start": {
"detection_pattern": {"event": "call_started"},
"extraction_map": {"call_id": "call.id", "direction": "call.direction"}
},
"media": {
"detection_pattern": {"event": "audio_data"},
"extraction_map": {"audio": "payload.audio_data"}
},
"clear": {
"detection_pattern": {"event": "buffer_clear"},
"extraction_map": {"action": "control.action"}
}
}
}
}
}

Audio Stream Requirements

Continuous Streaming: Audio must be streamed continuously without gaps

Real-time Processing: Latency should be minimized (< 100ms recommended)

Buffer Management: Proper audio buffering to prevent dropouts

Error Handling: Graceful handling of audio processing errors

Format Consistency: Maintain consistent audio format throughout session

Error Handling

Providers should implement proper error handling for:

Connection failures

Audio format mismatches

Network interruptions

Buffer overflow/underflow

Invalid event formats

Integration Testing

Test Scenarios

Connection Test: Verify WebSocket connection establishment

Event Sequence Test: Validate all four mandatory events are sent in correct order

Audio Streaming Test: Confirm bidirectional audio streaming works

Error Recovery Test: Test handling of connection drops and recovery

Format Compatibility Test: Verify audio format conversion works correctly

Sample Test Implementation

// Example test for event validation
const testEvents = [
'handshake/connected', // Must be present
'meta/start', // Must be present
'audio/media', // Must be present
'control/clear' // Must be present
// 'control/mark' // Optional
];

function validateProviderEvents(providerEvents) {
const requiredEvents = testEvents.slice(0, 4); // First 4 are mandatory

return requiredEvents.every(event =>
providerEvents.some(pe => pe.mapsTo === event)
);
}

Best Practices

For Telephony Providers

Event Ordering: Send events in the correct sequence (handshake → meta → audio/control → stop)

Audio Quality: Ensure high-quality audio with minimal noise and distortion

Latency Optimization: Minimize processing delays in audio pipeline

Resource Management: Properly manage memory and connection resources

Documentation: Provide clear documentation of your event formats and audio specifications

For Integration Teams

Configuration Testing: Thoroughly test event mapping configurations

Audio Format Validation: Verify audio format compatibility before production

Load Testing: Test system under realistic call volumes

Monitoring Setup: Implement proper logging and monitoring for debugging

Fallback Mechanisms: Implement fallback procedures for connection failures

Support and Documentation

For technical support and additional documentation:

Formi's API Documentation for configuring telephony

Twilio websocket documentation reference

Exotel websocket documentation reference

Overview#

WebSocket Connection URL#

Connection Endpoint#

Required Path Parameters#

Optional Query Parameters#

Example Connection URLs#

Universal Telephony Adapter#

Key Features#

Event Types and Requirements#

1. Handshake Event (Connected and Stop Event)#

2. Meta Event (Start Event)#

3. Audio Event (Media Event)#

4. Control Event (Mark and Clear Events)#

Audio Event Specifications#

Supported Audio Formats#

Input Audio Formats (From Provider to Formi)#

Output Audio Formats (From Formi to Provider)#

Audio Configuration Parameters#

Audio Processing Rules#

Provider Implementation Requirements#

Mandatory Event Validation Checklist#

Event Mapping Configuration#

Audio Stream Requirements#

Error Handling#

Integration Testing#

Test Scenarios#

Sample Test Implementation#

Best Practices#

For Telephony Providers#

For Integration Teams#

Support and Documentation#

Overview

WebSocket Connection URL

Connection Endpoint

Required Path Parameters

Optional Query Parameters

Example Connection URLs

Universal Telephony Adapter

Key Features

Event Types and Requirements

1. Handshake Event (Connected and Stop Event)

2. Meta Event (Start Event)

3. Audio Event (Media Event)

4. Control Event (Mark and Clear Events)

Audio Event Specifications

Supported Audio Formats

Input Audio Formats (From Provider to Formi)

Output Audio Formats (From Formi to Provider)

Audio Configuration Parameters

Audio Processing Rules

Provider Implementation Requirements

Mandatory Event Validation Checklist

Event Mapping Configuration

Audio Stream Requirements

Error Handling

Integration Testing

Test Scenarios

Sample Test Implementation

Best Practices

For Telephony Providers

For Integration Teams

Support and Documentation