Formi WebSocket Configuration Guide
Overview#
Formi enables intelligent AI-powered voice interactions over phone calls through real-time audio streaming. This guide is for any client or partner integrating third-party telephony services (e.g., Exotel, Twilio, Knowlarity, Convox, Tata Tele etc.) with Formi using WebSockets.This documentation provides a comprehensive guideline on how to:Set up the WebSocket connection
Validate the set of events that Formi's system expects
Configure audio formatting that Formi's system expects
Establish a direct WebSocket connection with Formi
Stream and receive audio in real-time
It is designed for teams implementing either:Unidirectional streams (call monitoring or transcription)
Bidirectional streams (interactive AI voice agents)
WebSocket Connection URL#
Connection Endpoint#
wss://<formi’s-domain>/ws-adapter/{provider}/{call_type}/{agent_id}/{outlet_id}/{virtual_number}?caller_id={caller_id}<formi’s-domain> : staging-api-2.formi.co.in or api-2.formi.co.inRequired Path Parameters#
Parameter | Type | Description | Example |
---|
provider | string | Short name for telephony provider (lowercase) | twilio , exotel |
call_type | string | Direction of the call, one of inbound or outbound | inbound , outbound |
agent_id | integer | Unique identifier for the agent | 12345 |
outlet_id | integer | Unique identifier for the outlet/client | 67890 |
virtual_number | string | Unique identifier for the configured number | +1234567890 |
Optional Query Parameters#
Parameter | Type | Description | Default |
---|
caller_id | string | Customer's phone number | Auto-generated UUID |
Example Connection URLs#
# Twilio inbound call
wss://staging-api-2.formi.co.in/ws-adapter/twilio/inbound/123/456/1234567890?caller_id=919701966915# Exotel outbound call
wss://api-2.formi.co.in/ws-adapter/exotel/outbound/789/012/09876543211?caller_id=9701966915Universal Telephony Adapter#
Formi's Universal Telephony Adapter is designed to work with any telephony provider regardless of their specific event naming conventions, payload structures, or audio encoding formats. The system adapts to different provider implementations while maintaining consistent internal processing**Key Features#
Provider Agnostic: Works with any telephony provider's WebSocket implementation
Dynamic Event Mapping: Automatically maps provider-specific events to Formi's internal event types
Audio Format Flexibility: Supports multiple audio encoding formats and automatic conversion
Template-Based Configuration: Uses JSON templates for easy provider integration
Event Types and Requirements#
Formi expects four mandatory event types from all telephony providers. These events can have different names and payload structures across providers, but the core functionality must be present.1. Handshake Event (Connected and Stop Event)#
Purpose: Establishes the WebSocket connection and confirms the communication channel is ready at the start of the call and sends acknowledgement in a similar manner at the end of the call.Formi Internal Name: connected, stop
Must be the first event sent after WebSocket connection
Should contain connection metadata
Confirms bidirectional communication capability
Session identifier (if available)
Provider-specific metadata
Purpose: Initiates the call session and provides call context.Formi Internal Name: start
Must be sent before audio streaming begins
Contains call setup information
Provides context for the AI system
Call direction (inbound/outbound)
Any additional call metadata
Purpose: Streams real-time audio data between telephony provider and Formi.Formi Internal Name: media
MANDATORY: Must support bidirectional audio streaming
Must maintain consistent audio format throughout the session
Should handle audio buffering appropriately
Must support real-time streaming with minimal latency
Audio Format Requirements:Sample rate: 8kHz (preferred) or 16kHz
Encoding: PCM, μ-law, or A-law
Channels: Mono (1 channel)
Bit depth: 8-bit or 16-bit
4. Control Event (Mark and Clear Events)#
Purpose: Handles call control operations and state management.Formi Internal Name: mark
, clear
Clear Event: MANDATORY - Used for user’s interruptions handling during the call with our AI agent
Mark Event: OPTIONAL - Used for marking specific points in audio stream
Must support session termination events
Audio buffer management, clear event should clear all the audio that is yet to be played from the audio buffer.
Audio Event Specifications#
1.
Binary PCM: Raw binary audio data
2.
Base64 PCM: PCM audio encoded in Base64
3.
μ-law Base64: μ-law encoded audio in Base64 format
1.
Base64 PCM: PCM audio encoded in Base64
2.
μ-law Base64: μ-law encoded audio in Base64 format
Audio Configuration Parameters#
{
"sample_rate": 8000,
"encoding": "PCM_16_LE",
"channels": 1,
"bit_depth": 16,
"chunk_size_ms": 20
}Audio Processing Rules#
1.
Sample Rate Conversion: Automatic conversion between different sample rates
2.
Channel Conversion: Support for mono/stereo conversion
3.
Format Conversion: Automatic encoding/decoding between supported formats
4.
Buffer Management: Proper handling of audio chunks and streaming
Provider Implementation Requirements#
Mandatory Event Validation Checklist#
Before integrating with Formi, telephony providers must validate that their system supports all four event types:Handshake Event: Connection establishment and disconnected events are implemented
Meta Event: Call initiation event with metadata is implemented
Audio Event: Bidirectional audio streaming is implemented
Control Event: At least clear event is implemented
Mark Event: Optional - Stream marking capability
Event Mapping Configuration#
Each provider needs to provide a configuration mapping their events to Formi's expected format:{
"provider": "your_provider_name",
"events": {
"incoming": {
"message_patterns": {
"connected": {
"detection_pattern": {"event": "connection_established"},
"extraction_map": {"status": "connection.status"}
},
"start": {
"detection_pattern": {"event": "call_started"},
"extraction_map": {"call_id": "call.id", "direction": "call.direction"}
},
"media": {
"detection_pattern": {"event": "audio_data"},
"extraction_map": {"audio": "payload.audio_data"}
},
"clear": {
"detection_pattern": {"event": "buffer_clear"},
"extraction_map": {"action": "control.action"}
}
}
}
}
}Audio Stream Requirements#
1.
Continuous Streaming: Audio must be streamed continuously without gaps
2.
Real-time Processing: Latency should be minimized (< 100ms recommended)
3.
Buffer Management: Proper audio buffering to prevent dropouts
4.
Error Handling: Graceful handling of audio processing errors
5.
Format Consistency: Maintain consistent audio format throughout session
Error Handling#
Providers should implement proper error handling for:Buffer overflow/underflow
Integration Testing#
Test Scenarios#
1.
Connection Test: Verify WebSocket connection establishment
2.
Event Sequence Test: Validate all four mandatory events are sent in correct order
3.
Audio Streaming Test: Confirm bidirectional audio streaming works
4.
Error Recovery Test: Test handling of connection drops and recovery
5.
Format Compatibility Test: Verify audio format conversion works correctly
Sample Test Implementation#
// Example test for event validation
const testEvents = [
'handshake/connected', // Must be present
'meta/start', // Must be present
'audio/media', // Must be present
'control/clear' // Must be present
// 'control/mark' // Optional
];function validateProviderEvents(providerEvents) {
const requiredEvents = testEvents.slice(0, 4); // First 4 are mandatoryreturn requiredEvents.every(event =>
providerEvents.some(pe => pe.mapsTo === event)
);
}Best Practices#
For Telephony Providers#
1.
Event Ordering: Send events in the correct sequence (handshake → meta → audio/control → stop)
2.
Audio Quality: Ensure high-quality audio with minimal noise and distortion
3.
Latency Optimization: Minimize processing delays in audio pipeline
4.
Resource Management: Properly manage memory and connection resources
5.
Documentation: Provide clear documentation of your event formats and audio specifications
For Integration Teams#
1.
Configuration Testing: Thoroughly test event mapping configurations
2.
Audio Format Validation: Verify audio format compatibility before production
3.
Load Testing: Test system under realistic call volumes
4.
Monitoring Setup: Implement proper logging and monitoring for debugging
5.
Fallback Mechanisms: Implement fallback procedures for connection failures
Support and Documentation#
For technical support and additional documentation:Modified at 2025-08-12 14:20:14