Endpoint
POST /v3/universal-ai/async (async)
Model string pattern: audio/speech_to_text_async/{provider}[/{model}]
Input
| Field | Type | Required | Description |
|---|---|---|---|
| file | file_input | Yes | Audio file ID from /v3/upload or direct file URL |
| language | string | No | Language code in ISO format (e.g., ‘en’, ‘fr’, ‘es’) |
| speakers | int | No | Number of speakers present in the audio |
| profanity_filter | bool | No | Whether to filter profanity and replace inappropriate words with a series of asterisks |
| vocabulary | array[string] | No | List of words or composed words to be detected by the speech to text engine |
Output
| Field | Type | Required | Description |
|---|---|---|---|
| text | string | Yes | |
| diarization | object | Yes | |
| total_speakers | int | Yes | |
| entries | array[object] | No | |
| segment | string | Yes | |
| start_time | string | Yes | |
| end_time | string | Yes | |
| speaker | int | Yes | |
| confidence | float | Yes | |
| error_message | string | No |
Available Providers
| Provider | Model String | Price |
|---|---|---|
| amazon | audio/speech_to_text_async/amazon | $0.024 per 60 secondes |
| assembly | audio/speech_to_text_async/assembly | $0.011 per 60 secondes |
| deepgram (base) | audio/speech_to_text_async/deepgram/base | $0.0169 per 60 secondes |
| deepgram | audio/speech_to_text_async/deepgram | $0.0189 per 60 secondes |
| deepgram (enhanced) | audio/speech_to_text_async/deepgram/enhanced | $0.0189 per 60 secondes |
| deepgram (nova-3) | audio/speech_to_text_async/deepgram/nova-3 | $0.0052 per 60 secondes |
| gladia | audio/speech_to_text_async/gladia | $0.0102 per 60 secondes |
audio/speech_to_text_async/google | $0.024 per 60 secondes | |
| microsoft | audio/speech_to_text_async/microsoft | $0.0168 per 60 secondes |
| openai | audio/speech_to_text_async/openai | $0.006 per 60 secondes |
Quick Start
This is an async feature. The initial response returns a job ID. Poll GET /v3/universal-ai/async/{job_id} until the job completes.