Realtime 1.5 Max
converts written text into spoken audio. ElevenLabs, Cartesia.fact-checked, all required fields present, publicly visiblereal-time
speech.dev has not published an operational assessment for this entry yet. Facts below are source-linked where available.
Overview
- Publisher
- Inworld
- Lab
- inworld
- Last updated
- 2026-05-23
- Modality
TTS converts written text into spoken audio. ElevenLabs, Cartesia.
Languages a voice model trained on multiple languages. Quality varies enormously by language — "supports 40 languages" may be great in 3 and mediocre in the other 37.
en-us, es-es, fr-fr, de-de, it-it, ja-jp, ko-kr, pt-br, zh-cn
Pricing
- Billing model
- Per character
- Rate
- $25 per 1M characters
Last verified May 23, 2026
Benchmarks
Speech Arena quality (AA)
Artificial Analysis TTSListed as: Realtime TTS 1.5 Max
Indexed on Artificial Analysis text-to-speech comparison; quality Elo from Speech Arena.
Verified 2026-05-23
Technical
- Streaming
- Supported
- Hosting
- saas-vendor-cloud
API access
WSSHTTPS streaming
API endpoints
wsshttps
SDK languages
python, typescript
Regions (vendor buckets)
Macro areas as published by the vendor (often broad; not a country list).
Compliance (vendor-published)
GDPR DPAHIPAA BAASOC 2 Type II
Suggest an update
Facts-layer corrections only — source URLs required. Opens a GitHub issue; a maintainer runs the content agent after triage. Not for operational notes.
Lab metadata: Update Inworld · All request types · Full guide